i have csv looks this:
f02303521,"smith,andy",ghi,"smith,andy",ghi,,, f04300621,"parker,helen",cert,"yu,betty",ious,,, i want delete lines 2nd column equal 4th column (ex. when smith,andy = smith,andy). tried in python using " delimiter , splitting columns into:
f02303521, smith,andy ,ghi, smith,andy ,ghi,,,
i tried python code:
testcsv = 'test.csv' deletiontext = 'linestodelete.txt' correct = 'correctone.csv' = 0 j = 0 #where & j keep track of line number open(deletiontext,'w') outfile: open(testcsv, 'r') csv: line in csv: = + 1 #on first line, equal 1. pi = line.split('"')[1] investigator = line.split('"')[3] #if equal each other, write line number text file deleted. if pi == investigator: outfile.write(i) #from txt, create list of line numbers not want include in output open(deletiontext, 'r') txt: lines_to_be_removed_list = [] # each line number in txt # remove return character @ end of line # , add line number list domains-to-be-removed list linenum in txt: linenum = linenum.rstrip() lines_to_be_removed_list.append(linenum) open(correct, 'w') outfile: open(deletiontext, 'r') csv: # each line in csv # extract line number line in csv: j = j + 1 # first line, line number 1 # if csv line number not in lines-to-be-removed list, # write outfile if (j not in lines_to_be_removed_list): outfile.write(line) but line:
pi = line.split('"')[1] i get:
traceback (most recent call last): file "c:/users/sskadamb/pycharmprojects/vastdeleteline/manipulation.py", line 11, in pi = line.split('"')[1] indexerror: list index out of range
and thought pi = smith,andy investigator = smith,andy... why not happen?
any appreciated, thanks!
when think csv, think pandas, great data analysis library python. here's how accomplish want:
import pandas pd fields = ['field{}'.format(i) in range(8)] df = pd.read_csv("data.csv", header=none, names=fields) df = df[df['field1'] != df['field3']] print df this prints:
field0 field1 field2 field3 field4 field5 field6 field7 1 f04300621 parker,helen cert yu,betty ious nan nan nan
Comments
Post a Comment