0

I am trying to do this from the official pandas documentation. pandas.DataFrame.fillna So Basicly filling up the NaN values in the df dataframe's "myc" column with values of 1.

DATA dataframe

df
   myc    B   C  D
0  NaN  2.0 NaN  0
1  0.2  4.0 NaN  1
2  NaN  NaN NaN  5
3  NaN  3.0 NaN  4

CODE 1

values = {'myc': 1}
df.fillna(value=values)

Results Goal 1

    myc B   C   D
0   1.0 2.0 NaN 0
1   0.2 4.0 NaN 1
2   1.0 NaN NaN 5
3   1.0 3.0 NaN 4

ERROR MESAGE 1

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-21-6a9e5a691bca> in <module>
      1 values = {'myc': 1}
----> 2 df.fillna(value=values)

~/anaconda3/lib/python3.8/site-packages/pandas/core/frame.py in fillna(self, value, method, axis, inplace, limit, downcast)
   4315         downcast=None,
   4316     ) -> Optional["DataFrame"]:
-> 4317         return super().fillna(
   4318             value=value,
   4319             method=method,

~/anaconda3/lib/python3.8/site-packages/pandas/core/generic.py in fillna(self, value, method, axis, inplace, limit, downcast)
   6071                     if k not in result:
   6072                         continue
-> 6073                     obj = result[k]
   6074                     obj.fillna(v, limit=limit, inplace=True, downcast=downcast)
   6075                 return result if not inplace else None

~/anaconda3/lib/python3.8/site-packages/pandas/core/frame.py in __getitem__(self, key)
   2876                 if self.columns.nlevels > 1:
   2877                     return self._getitem_multilevel(key)
-> 2878                 return self._get_item_cache(key)
   2879 
   2880         # Do we have a slicer (on rows)?

~/anaconda3/lib/python3.8/site-packages/pandas/core/generic.py in _get_item_cache(self, item)
   3539 
   3540             loc = self.columns.get_loc(item)
-> 3541             values = self._mgr.iget(loc)
   3542             res = self._box_col_values(values, loc)
   3543 

~/anaconda3/lib/python3.8/site-packages/pandas/core/internals/managers.py in iget(self, i)
    986         Return the data as a SingleBlockManager.
    987         """
--> 988         block = self.blocks[self.blknos[i]]
    989         values = block.iget(self.blklocs[i])
    990 

TypeError: only integer scalar arrays can be converted to a scalar index

CODE 2 I have also later on tried to list out the unique features for the any_feature column df['any_feature'].unique()

ERROR 2

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-39-934988075beb> in <module>
----> 1 df['any_feature'].unique()

~/anaconda3/lib/python3.8/site-packages/pandas/core/frame.py in __getitem__(self, key)
   2876                 if self.columns.nlevels > 1:
   2877                     return self._getitem_multilevel(key)
-> 2878                 return self._get_item_cache(key)
   2879 
   2880         # Do we have a slicer (on rows)?

~/anaconda3/lib/python3.8/site-packages/pandas/core/generic.py in _get_item_cache(self, item)
   3539 
   3540             loc = self.columns.get_loc(item)
-> 3541             values = self._mgr.iget(loc)
   3542             res = self._box_col_values(values, loc)
   3543 

~/anaconda3/lib/python3.8/site-packages/pandas/core/internals/managers.py in iget(self, i)
    986         Return the data as a SingleBlockManager.
    987         """
--> 988         block = self.blocks[self.blknos[i]]
    989         values = block.iget(self.blklocs[i])
    990 

TypeError: only integer scalar arrays can be converted to a scalar index

Tried Solutions

sogu
  • 2,738
  • 5
  • 31
  • 90

2 Answers2

1

Something weird is going on in your code, because:

  • the replacement of NaN should occur only in myc column,
  • but your result contains replaced values also e.g. in C column and NaN are replaced there with 2.

Run just the below code (separated from your code):

import pandas as pd
import io

txt = '''myc,B,C,D
NaN,2.0,NaN,0
3.0,4.0,NaN,1
NaN,NaN,NaN,5
NaN,3.0,NaN,4'''

df = pd.read_csv(io.StringIO(txt))
result = df.fillna(value={'myc': 1})

The result should be:

   myc    B   C  D
0  1.0  2.0 NaN  0
1  3.0  4.0 NaN  1
2  1.0  NaN NaN  5
3  1.0  3.0 NaN  4

If you get the same result, then apparently there is something wrong with your code, but in some other place (outside the piece of code that you presented).

Another detail to change is that values is an attribute of Pandas and you should not use variables with the same names.

Valdi_Bo
  • 30,023
  • 4
  • 23
  • 41
  • I have made corrections, sorry I have typed in here my DF so I have randomly put there the values. – sogu Oct 13 '20 at 15:10
0

Simple Solution

  • Export dataframe to csv: df.to_csv(r'somefilename.csv', index=False)
  • Load back the saem data to a DataFrame: df1 = pd.read_csv("r'somefilename.csv")
sogu
  • 2,738
  • 5
  • 31
  • 90