regex - Is there a good way to find exact matches of a extremely long string ~500 characters from a couple megabyte sized CSV file? -
i'm trying find match of ~500 character long dna sequence few megabyte large csv file containing different sequences. before each sequence in csv file, there metadata have. each sequence , sequence metadata take 1 line. i've tried
grep -b 1 "extremelylongstringofdnatacggcatagaggccgagacctaggattaacgttactgacgat" csvfile.csv however returns filename long
an interesting , frustrating thing bumped when tried find line count of csv file using
wc -l csvfile.csv it returned
0 csvfile.csv and without -l flag, returned
0 161410 41507206 csvfile.csv this result after added line between end of each sequence , start of following metadata of next sequence.
the issue file had cr line terminators , gnu tools not detecting line endings , therefore reading file 1 huge line. solved issue using mac2unix convert file make gnu line-ending readable.
thanks etan reisner providing hint
Comments
Post a Comment