regex - Is there a good way to find exact matches of a extremely long string ~500 characters from a couple megabyte sized CSV file? -


i'm trying find match of ~500 character long dna sequence few megabyte large csv file containing different sequences. before each sequence in csv file, there metadata have. each sequence , sequence metadata take 1 line. i've tried

grep -b 1 "extremelylongstringofdnatacggcatagaggccgagacctaggattaacgttactgacgat" csvfile.csv 

however returns filename long

an interesting , frustrating thing bumped when tried find line count of csv file using

wc -l csvfile.csv 

it returned

0 csvfile.csv 

and without -l flag, returned

0  161410 41507206 csvfile.csv 

this result after added line between end of each sequence , start of following metadata of next sequence.

the issue file had cr line terminators , gnu tools not detecting line endings , therefore reading file 1 huge line. solved issue using mac2unix convert file make gnu line-ending readable.

thanks etan reisner providing hint


Comments