(i have opened issue @ github.)
the following behavior doesn't seem correct me. seems if default read_csv na_values=false no values including 'na' should interpreted nan not appear case.
this behavior noticed in this post (see comments answer @jianxunli), 'na' means 'north america'. unable find way read in without having changed nan , there should way this.
here's example csv.
%more foo.txt x,y "na",na "foo",foo i'm including 'na' both in quotes , outside see if matters, can see below doesn't seem to.
pd.read_csv('foo.txt') out[56]: x y 0 nan nan 1 foo foo pd.read_csv('foo.txt',na_values=false) out[57]: x y 0 nan nan 1 foo foo pd.read_csv('foo.txt',na_values='foo') out[58]: x y 0 nan nan 1 nan nan it appears data values of 'nan' treated same 'na'.
edit add: think understanding better based on @marius's answer although doesn't seem right me (the default behavior, is, not marius's answer seem correct explanation of happening).
na_values=false => na , nan treated nan na_values='foo' => na, nan, , foo treated nan i guess can understand being default behavior in number column doesn't seem should default string column. have struggled figure out documentation without seeing marius's answer.
edit add (2):
also, comparison, read stata , excel , in both cased treat 'na' plain text, not nan/missing. there other package or library have same default behavior pandas here?
you need keep_default_na=false, default strings include in na_values added standard set of na strings, e.g. na, nan:
pd.read_csv('foo.txt', keep_default_na=false) out[5]: x y 0 na na 1 foo foo
Comments
Post a Comment