0

I am trying to work with a Pandas DataFrame which has some NaN values. When I try to

df.fillna(df.mean())

I get the following error and can not find a solution or reason for it: Error:

TypeError: cannot label index with a null key

All columns are int or float. I am even able to extract the single columns into an array, do fillna() on this array and re-integrate into the DataFrame.

Any idea or hint? Thank you very much!


My code:

test=pd.read_csv("../input/test.csv")
test.fillna(test.mean(),inplace=True)

The file I am working on is from Kaggle the test or train.csv. I have same error for both data: https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data

The Error code is like this:


TypeError                                 Traceback (most recent call last)
<ipython-input-29-ab3e419316e1> in <module>()
     14 
     15 #Also test has NaN's
---> 16 test.fillna(test.mean(),inplace=True)

/opt/conda/lib/python3.6/site-packages/pandas/core/frame.py in fillna(self, value, method, axis, inplace, limit, downcast, **kwargs)
   2752                      self).fillna(value=value, method=method, axis=axis,
   2753                                   inplace=inplace, limit=limit,
-> 2754                                   downcast=downcast, **kwargs)
   2755 
   2756     @Appender(_shared_docs['shift'] % _shared_doc_kwargs)

/opt/conda/lib/python3.6/site-packages/pandas/core/generic.py in fillna(self, value, method, axis, inplace, limit, downcast)
   3645                     if k not in result:
   3646                         continue
-> 3647                     obj = result[k]
   3648                     obj.fillna(v, limit=limit, inplace=True, downcast=downcast)
   3649                 return result

/opt/conda/lib/python3.6/site-packages/pandas/core/frame.py in __getitem__(self, key)
   1962             return self._getitem_multilevel(key)
   1963         else:
-> 1964             return self._getitem_column(key)
   1965 
   1966     def _getitem_column(self, key):

/opt/conda/lib/python3.6/site-packages/pandas/core/frame.py in _getitem_column(self, key)
   1972 
   1973         # duplicate columns & possible reduce dimensionality
-> 1974         result = self._constructor(self._data.get(key))
   1975         if result.columns.is_unique:
   1976             result = result[key]

/opt/conda/lib/python3.6/site-packages/pandas/core/internals.py in get(self, item, fastpath)
   3603 
   3604             if isnull(item):
-> 3605                 raise TypeError("cannot label index with a null key")
   3606 
   3607             indexer = self.items.get_indexer_for([item])

TypeError: cannot label index with a null key


The error message is as follows:
D. Eggert
  • 1
  • 1
  • 5
  • explain with your data ( input / code/ output) – Bhanuchander Udhayakumar Jan 30 '18 at 07:39
  • example code/input/output (properly formatted) would help us to help you – flx Jan 30 '18 at 07:49
  • 2
    Welcome to StackOverflow. Please take the time to read this post on [how to provide a great pandas example](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) as well as how to provide a [minimal, complete, and verifiable example](http://stackoverflow.com/help/mcve) and revise your question accordingly. These tips on [how to ask a good question](http://stackoverflow.com/help/how-to-ask) may also be useful. – jezrael Jan 30 '18 at 07:51
  • I think you need [this](https://stackoverflow.com/q/48478457/2901002) - numeric replace by means and non numeric by mode. – jezrael Jan 30 '18 at 08:11
  • Thanks @jezrael (and zeropublix and Uchiha_Itachi) for supporting and welcoming me. Now I learned how to improve my questions for the future! – D. Eggert Jan 30 '18 at 12:24

2 Answers2

1

The following example seems to work nicely:

import pandas

x = pandas.DataFrame({
    'x_1': [0, 1, 2, 3, 0, 1, 2, None, ],
    'x_2': [0, 1, None, 3, 0, 1, 2, pandas.np.nan, ],
    'x_3': [0, 1, 2, 3, 0, 1, 2, None, ],
    'x_4': [0, 1, 2, 3, 0, pandas.np.NAN, 2, None, ],},
    index=[0, 1, 2, 3, 4, 5, 6, 7])

x.fillna(x.mean(), inplace=True)

x.head()

producing:

    x_1       x_2       x_3       x_4
0  0.000000  0.000000  0.000000  0.000000
1  1.000000  1.000000  1.000000  1.000000
2  2.000000  1.166667  2.000000  2.000000
3  3.000000  3.000000  3.000000  3.000000
4  0.000000  0.000000  0.000000  0.000000
5  1.000000  1.000000  1.000000  1.333333
6  2.000000  2.000000  2.000000  2.000000
7  1.285714  1.166667  1.285714  1.333333

Take a deeper look to your input data.

Pierluigi
  • 1,048
  • 2
  • 9
  • 16
  • Thank you for your support. Indeed I needed to look deeper into my data. The reason for the error was that I had multiple columns with the same name. After cleaning this up fillna worked just fine. – D. Eggert Jan 30 '18 at 12:22
0

You can try with:

df['your_column'] = df['your_column'].fillna((df['your_column'].mean()))

In this way you fill the NaN values with the average of its own column.

Joe
  • 12,057
  • 5
  • 39
  • 55
  • 1
    Thank you for your support. Yes, column-wise it worked fine. Issue was that multiple columns had the same name what caused the error. – D. Eggert Jan 30 '18 at 12:22