5
d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
  'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)
print df ['one']

Output:

    a    1.0

    b    2.0

    c    3.0

    d    NaN

Name: one, dtype: float64

The value is set as float

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
  'two' : pd.Series([1, 2, 3], index=['a', 'b', 'c'])}

df = pd.DataFrame(d)
print df ['one']

Output:

a    1

b    2

c    3

Name: one, dtype: int64

But now the value is set as int64.

The difference is the first one, there is a NaN in the value.

What is the rule behind the set up of the data types in the above examples?

Thanks!

gaganso
  • 2,914
  • 2
  • 26
  • 43
searain
  • 3,143
  • 6
  • 28
  • 60

2 Answers2

6

Type of NaN is float, so pandas will infer all ints numbers to be floats too.

This can be easily checked :

>>> type(np.nan) 
float 

I would recommend this interesting read

rafaelc
  • 57,686
  • 15
  • 58
  • 82
3

inherits many bad decisions from .

Refer to:

Pandas Gotchas - Integer NA

Numpy or Pandas, keeping array type as integer while having a nan value

If you look at type(df.iloc[3,0]), you can see nan is of type numpy.float64, which forces type coercion of the entire column to floats. Basically, Pandas is garbage for dealing with nullable integers, and you just have to deal with them as floating point numbers. You can also use the object type to hold integers, if performance isn't a concern.

Joel Bondurant
  • 841
  • 1
  • 10
  • 17