python - Is this correct behavior for read_csv and a data value of NA? -


(i have opened issue @ github.)

the following behavior doesn't seem correct me. seems if default read_csv na_values=false no values including 'na' should interpreted nan not appear case.

this behavior noticed in this post (see comments answer @jianxunli), 'na' means 'north america'. unable find way read in without having changed nan , there should way this.

here's example csv.

%more foo.txt x,y "na",na "foo",foo 

i'm including 'na' both in quotes , outside see if matters, can see below doesn't seem to.

pd.read_csv('foo.txt') out[56]:       x    y 0  nan  nan 1  foo  foo  pd.read_csv('foo.txt',na_values=false) out[57]:       x    y 0  nan  nan 1  foo  foo  pd.read_csv('foo.txt',na_values='foo') out[58]:      x   y 0 nan nan 1 nan nan 

it appears data values of 'nan' treated same 'na'.

edit add: think understanding better based on @marius's answer although doesn't seem right me (the default behavior, is, not marius's answer seem correct explanation of happening).

na_values=false    =>   na , nan treated nan na_values='foo'    =>   na, nan, , foo treated nan 

i guess can understand being default behavior in number column doesn't seem should default string column. have struggled figure out documentation without seeing marius's answer.

edit add (2):

also, comparison, read stata , excel , in both cased treat 'na' plain text, not nan/missing. there other package or library have same default behavior pandas here?

you need keep_default_na=false, default strings include in na_values added standard set of na strings, e.g. na, nan:

pd.read_csv('foo.txt', keep_default_na=false) out[5]:       x    y 0   na   na 1  foo  foo 

Comments