4

I'm running Pandas in Python 3 and I noticed that the following:

import pandas as pd
import numpy as np
from pandas import DataFrame
from numpy import nan

df = DataFrame([[1, nan], [nan, 4], [5, 6]])

print(df)

df2 = df
df2.fillna(0)

print(df2)

Returns the following:

 0   1
0   1 NaN
1 NaN   4
2   5   6
    0   1
0   1 NaN
1 NaN   4
2   5   6

While the following:

import pandas as pd
import numpy as np
from pandas import Series
from numpy import nan

sr1 = Series([1,2,3,nan,5,6,7])

sr1.fillna(0)

Returns the following:

0    1
1    2
2    3
3    0
4    5
5    6
6    7
dtype: float64

So it's filling in Series values but not DataFrame values with 0 when I use .fillna(). What am I missing here to get 0s in place of null values in DataFrames?

smci
  • 32,567
  • 20
  • 113
  • 146
cenveoanalyst
  • 77
  • 1
  • 2
  • 9
  • Not what's going on here but may help somebody, you can't use df.fillna with df.mean (to replace missing values with column means) if the dtype is not some numeric. Sounds obvious but df.mean() on it's own still works. – D A Wells Feb 10 '20 at 14:20

2 Answers2

4

It has to do with the way you're calling the fillna() function.

If you do inplace=True (see code below), they will be filled in place and overwrite your original data frame.

In [1]: paste
import pandas as pd
import numpy as np
from pandas import DataFrame
from numpy import nan

df = DataFrame([[1, nan], [nan, 4], [5, 6]])
## -- End pasted text --

In [2]: 

In [2]: df
Out[2]: 
    0   1
0   1 NaN
1 NaN   4
2   5   6

In [3]: df.fillna(0)
Out[3]: 
   0  1
0  1  0
1  0  4
2  5  6

In [4]: df2 = df

In [5]: df2.fillna(0)
Out[5]: 
   0  1
0  1  0
1  0  4
2  5  6

In [6]: df2  # note how this is unchanged.
Out[6]: 
    0   1
0   1 NaN
1 NaN   4
2   5   6

In [7]: df.fillna(0, inplace=True)  # this will replace the values.

In [8]: df
Out[8]: 
   0  1
0  1  0
1  0  4
2  5  6

In [9]: 
ericmjl
  • 13,541
  • 12
  • 51
  • 80
3

As you can read in the documentation, the method fillna(newValue) returns another DataFrame like the previous one, but with the nan values replaced by the new value.

df = DataFrame([[1, nan], [nan, 2], [3, 2]])
df2 = df.fillna(0)

print(df2)
# Outputs
#   0 1
# 0 1 0
# 1 0 2
# 2 3 2

print(df)
# Outputs (The previous one isn't modified)
#   0   1
# 0 1   nan
# 1 nan 2
# 2 3   2
Alberto Bonsanto
  • 17,556
  • 10
  • 64
  • 93