1

I have this dataframe boroughCounts with these sample values:

    From    To          Count
9   None    Manhattan   302
10  Bronx   Bronx       51
11  Bronx   Manhattan   244
12  None    Brooklyn    8
13  Bronx   Queens      100
14  None    None        67

Trying to filter out None values in "From" and "To" columns using this approach as described here or here:

boroughCounts = boroughCounts[(boroughCounts.From != None) & (boroughCounts.To != None)]

boroughCounts = boroughCounts[(boroughCounts["From"] != None) & (boroughCounts["To"] != None)]

But it doesn't work, and all values remained as is. Am I using it wrong, or is there a better way to do it?

ayehia
  • 85
  • 12

2 Answers2

1

Use this, because None is a string and you need to replace that string with NaN:

df_out = boroughCounts.replace('None', np.nan).dropna()
df_out

Output:

     From         To  Count
10  Bronx      Bronx     51
11  Bronx  Manhattan    244
13  Bronx     Queens    100

Or you could use boolean indexing by using "None":

boroughCounts[(boroughCounts.From != "None") & (boroughCounts.To != "None")]
Scott Boston
  • 147,308
  • 15
  • 139
  • 187
  • 1
    using "None" worked, thanks @Scott, seems it was converted to string when imported from the RDD! not sure – ayehia Jun 20 '18 at 13:43
1

Inspect your dataframe to understand the types.

boroughCounts.dtypes

This will tell you that he To and From cols are type object. That could mean they are all strings or a combination of string and None type. Inspect one of your Nones.

type(boroughCounts.iloc[15].From)

This will show you if the None in the From column for row 15 is a string. If so, you need to change your query.

Kyle
  • 2,814
  • 2
  • 17
  • 30