3

I am trying to replace the NaN in certain columns with the sum of the row in a Pandas DataFrame. See below the example data:

Items|  Estimate1|  Estimate2|  Estimate3|     
Item1|  NaN      |     NaN   |            8    
Item2|  NaN      |  NaN          |  5.5|

I am hoping to have Estimate 1 & 2 to be 8 and 5.5 for Item 1 and 2 respectively.

So far I have tried using df.fillna(df.sum(), inplace=True) but there is no change in the DataFrame. Can anyone assist me correct my code or recommend the right way to do it?

Cleb
  • 25,102
  • 20
  • 116
  • 151
Avagut
  • 924
  • 3
  • 18
  • 34
  • Can you try to provide `axis=1` to both the `fillna` and `sum` call? – joris Apr 06 '15 at 20:03
  • @Joris I have tried df.fillna(df.sum(), inplace=True,axis = 1) and I have gotten an error : 'NotImplementedError: Currently only can fill with dict/Series column by column' – Avagut Apr 06 '15 at 20:09
  • Indeed, you're right. See my answer for a workaround – joris Apr 06 '15 at 20:24

2 Answers2

4

Providing axis=1 does not seem to work (as filling with a Series only works for the column-by-column case, not for row-by-row).
A workaround is to 'broadcast' the sum of each row to a dataframe that has the same index/columns as the original one. With a slightly modified example dataframe:

In [57]: df = pd.DataFrame([[np.nan, 3.3, 8], [np.nan, np.nan, 5.5]], index=['Item1', 'Item2'], columns=['Estimate1', 'Estimate2', 'Estimate3'])

In [58]: df
Out[58]:
       Estimate1  Estimate2  Estimate3
Item1        NaN        3.3        8.0
Item2        NaN        NaN        5.5

In [59]: fill_value = pd.DataFrame({col: df.sum(axis=1) for col in df.columns})

In [60]: fill_value
Out[60]:
       Estimate1  Estimate2  Estimate3
Item1       11.3       11.3       11.3
Item2        5.5        5.5        5.5

In [61]: df.fillna(fill_value)
Out[61]:
       Estimate1  Estimate2  Estimate3
Item1       11.3        3.3        8.0
Item2        5.5        5.5        5.5

There is an open enhancement issue for this: https://github.com/pydata/pandas/issues/4514

joris
  • 133,120
  • 36
  • 247
  • 202
1

As an alternative, you can also use an apply with a lambda expression like this:

df.apply(lambda row: row.fillna(row.sum()), axis=1)

yielding the desired outcome

       Estimate1  Estimate2  Estimate3
Item1       11.3        3.3        8.0
Item2        5.5        5.5        5.5

Not sure about efficiency though.

Cleb
  • 25,102
  • 20
  • 116
  • 151