3

I want to remove rows that are blank for a certain column and then filter on that column:

If I do it like this:

df['location'].dropna(inplace=True)
dfloc = df[df['location'] == myvar]

Then I get this error:

IndexingError: Unalignable boolean Series key provided

So I have to use dropna like this instead to avoid the error:

df.dropna(subset = ['location'],inplace=True)
dfloc = df[df['location'] == myvar]

Does anyone know why the first method yields an error while the second does not?

Here is an sample of my DataFrame:

      uid                date          location
1  1114-104119      2017-11-14 10:41:19     Chicago
2  1114-104056      2017-11-14 10:40:56     NaN
3  1114-104055      2017-11-14 10:40:55     LA
4  1114-103223      2017-11-14 10:32:23     NaN
5  1114-103050      2017-11-14 10:30:50     NYC
sparrow
  • 10,794
  • 12
  • 54
  • 74
  • Do you have some data to test with? – Scott Boston Nov 14 '17 at 20:39
  • Sorry, I can't share it. – sparrow Nov 14 '17 at 20:41
  • If someone would be kind enough to explain the down votes I would really appreciate it. I seem to have mistakenly thought it was an interesting question which might help others in the future and would like some feedback to be a better member of the community :) – sparrow Nov 14 '17 at 20:52
  • 2
    You should have created some dummy data to go along with this question to mimic the problem as I have done in the proposed answer below. Read [this post](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) about how to create a good Pandas question – Scott Boston Nov 14 '17 at 20:55
  • Thanks for the feedback and the explanation. – sparrow Nov 14 '17 at 20:56
  • If you go back and add some data, you might get those downvotes converted. Good luck – Scott Boston Nov 14 '17 at 20:58
  • Will do. when I told you "I couldn't share it" I thought you were talking about my actual dataset, which in retrospect was silly of me. – sparrow Nov 14 '17 at 21:04

1 Answers1

3

The first method, what is happening is that you are creating a copy or a slice of the original dataframe and dropping rows from that series, you are not really affecting the original dataframe. Now, when you try to use that mangled series to slice your original dataframe the indexes of the series doesn't match that of the original dataframe. Hence, the error

IndexingError: Unalignable boolean Series key provided

Here is proof.

df = pd.DataFrame({'Location':[1,np.nan,3,np.nan],'A':np.random.randint(0,10,4)})
df
   A  Location
0  7       1.0
1  6       NaN
2  1       3.0
3  8       NaN

df['Location'].dropna(inplace=True)
print(df['Location'])
0    1.0
2    3.0
Name: Location, dtype: float64

However, if you print df again you get the full dataframe, you have not modified this dataframe.

print(df)

   A  Location
0  7       1.0
1  6       NaN
2  1       3.0
3  8       NaN

In the second method you are preforming a drop on the oringal dataframe based on a subset selection. Therefore, that method works and you can use that series to do boolean indexing of your dataframe.

Scott Boston
  • 147,308
  • 15
  • 139
  • 187