python - list index out of range, working with CSV? -


i have csv looks this:

f02303521,"smith,andy",ghi,"smith,andy",ghi,,, f04300621,"parker,helen",cert,"yu,betty",ious,,, 

i want delete lines 2nd column equal 4th column (ex. when smith,andy = smith,andy). tried in python using " delimiter , splitting columns into:

f02303521, smith,andy ,ghi, smith,andy ,ghi,,,

i tried python code:

testcsv = 'test.csv' deletiontext = 'linestodelete.txt' correct = 'correctone.csv' = 0 j = 0  #where & j keep track of line number   open(deletiontext,'w') outfile:      open(testcsv, 'r') csv:           line in csv:             = + 1 #on first line, equal 1.              pi = line.split('"')[1]             investigator = line.split('"')[3]          #if equal each other, write line number text file         deleted.          if pi == investigator:             outfile.write(i)    #from txt, create list of line numbers not want include in output open(deletiontext, 'r') txt:     lines_to_be_removed_list = []      # each line number in txt     # remove return character @ end of line     # , add line number list domains-to-be-removed list     linenum in txt:         linenum = linenum.rstrip()         lines_to_be_removed_list.append(linenum)   open(correct, 'w') outfile:     open(deletiontext, 'r') csv:          # each line in csv         # extract line number         line in csv:             j = j + 1 # first line, line number 1                 # if csv line number not in lines-to-be-removed list,             # write outfile             if (j not in lines_to_be_removed_list):                 outfile.write(line) 

but line:

pi = line.split('"')[1]  

i get:

traceback (most recent call last): file "c:/users/sskadamb/pycharmprojects/vastdeleteline/manipulation.py", line 11, in pi = line.split('"')[1] indexerror: list index out of range

and thought pi = smith,andy investigator = smith,andy... why not happen?

any appreciated, thanks!

when think csv, think pandas, great data analysis library python. here's how accomplish want:

import pandas pd  fields = ['field{}'.format(i) in range(8)] df = pd.read_csv("data.csv", header=none, names=fields) df = df[df['field1'] != df['field3']] print df 

this prints:

      field0        field1 field2    field3 field4  field5  field6  field7 1  f04300621  parker,helen   cert  yu,betty   ious     nan     nan     nan 

Comments