49

I have a large dataframe. When it was created 'None' was used as the value where a number could not be calculated (instead of 'nan')

How can I delete all rows that have 'None' in any of it's columns? I though I could use df.dropna and set the value of na, but I can't seem to be able to.

Thanks

I think this is a good representation of the dataframe:

temp = pd.DataFrame(data=[['str1','str2',2,3,5,6,76,8],['str3','str4',2,3,'None',6,76,8]])
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
jlt199
  • 2,349
  • 6
  • 23
  • 43

5 Answers5

52

Setup
Borrowed @MaxU's df

df = pd.DataFrame([
    [1, 2, 3],
    [4, None, 6],
    [None, 7, 8],
    [9, 10, 11]
], dtype=object)

Solution
You can just use pd.DataFrame.dropna as is

df.dropna()

   0   1   2
0  1   2   3
3  9  10  11

Supposing you have None strings like in this df

df = pd.DataFrame([
    [1, 2, 3],
    [4, 'None', 6],
    ['None', 7, 8],
    [9, 10, 11]
], dtype=object)

Then combine dropna with mask

df.mask(df.eq('None')).dropna()

   0   1   2
0  1   2   3
3  9  10  11

You can ensure that the entire dataframe is object when you compare with.

df.mask(df.astype(object).eq('None')).dropna()

   0   1   2
0  1   2   3
3  9  10  11
piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • I get the error `TypeError: Could not compare ['None'] with block values` with the strings solution and a dataframe of the same size as before with the first solution – jlt199 Aug 04 '17 at 18:26
36

Thanks for all your help. In the end I was able to get

df = df.replace(to_replace='None', value=np.nan).dropna()

to work. I'm not sure why your suggestions didn't work for me.

jlt199
  • 2,349
  • 6
  • 23
  • 43
11

UPDATE:

In [70]: temp[temp.astype(str).ne('None').all(1)]
Out[70]:
      0     1  2  3  4  5   6  7
0  str1  str2  2  3  5  6  76  8

Old answer:

In [35]: x
Out[35]:
      a     b   c
0     1     2   3
1     4  None   6
2  None     7   8
3     9    10  11

In [36]: x = x[~x.astype(str).eq('None').any(1)]

In [37]: x
Out[37]:
   a   b   c
0  1   2   3
3  9  10  11

or bit nicer variant from @roganjosh:

In [47]: x = x[x.astype(str).ne('None').all(1)]

In [48]: x
Out[48]:
   a   b   c
0  1   2   3
3  9  10  11
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
2

im a bit late to the party, but this is prob the simplest method:

df.dropna(axis=0, how='any')

Parameters: axis='index/column' how='any/all'

axis '0' is for dropping rows (most common), and '1' will drop columns instead. and the parameter how will drop if there are 'any' None types in the row/ column, or if they are all None types (how='all')

Oenomaus
  • 21
  • 1
  • 3
  • NOTE that how='any' is the default value for .dropna(), hence, this doesn't work which is precisely what @jlt199 pointed out in his question. – iamakhilverma Sep 11 '22 at 13:16
0

if still None is not removed , we can do

df = df.replace(to_replace='None', value=np.nan).dropna()

the above solution worked partially still the None was converted to NaN but not removed (thanks to the above answer as it helped to move further) so then i added one more line of code that is take the particular column

df['column'] = df['column'].apply(lambda x : str(x))

this changed the NaN to nan now remove the nan

df = df[df['column'] != 'nan']

Raj
  • 173
  • 5