Pandas best way to subset a dataframe inplace, using a mask

Question

I have a pandas dataset that I want to downsize (remove all values under x).

The mask is df[my_column] > 50

I would typically just use df = df[mask], but want to avoid making a copy every time, particularly because it gets error prone when used in functions (as it only gets altered in the function scope).

What is the best way to subset a dataset inplace?

I was thinking of something along the lines of
df.drop(df.loc[mask].index, inplace = True)

Is there a better way to do this, or any situation where this won't work at all?

I'm always confused by the view vs copy thing in pandas. Essentially I want to give it a condition to drop, and drop inplace. The df.loc[mask].index will give the me indexes to drop, correct? — sapo_cosmico, Oct 13 '15 at 13:39
Sorry what's wrong with `df = df[mask]`? this will eventually recover the memory for the dropped rows? — EdChum, Oct 13 '15 at 13:40
More error prone, and when used in functions makes a "local" copy, which then has to be returned. I want to do a few alterations in place, not just for memory purposes. `df.drop(df.loc[mask].index, inplace = True)` seems to work, but I expect there might be a better solution (as mine will probably fail on multi-level indexes etc) — sapo_cosmico, Oct 13 '15 at 13:42
Not sure what you mean by makes a 'local' copy. I'd define df as a global variable, OR make it a class instance. Passing df as an argument to a bunch of functions and then doing changes to df is indeed error-prone. — alex314159, Oct 13 '15 at 14:11

Arcyno · Answer 1 · 2019-06-14T08:28:09.387

17

You are missing the inplace parameter :

df.drop(df[df.my_column < 50].index, inplace = True)

edited Jun 14 '19 at 08:28

answered Jun 07 '19 at 14:54

Arcyno

4,153
3
34
52

2

I think you want `<= 50` in the mask to drop, since the OP wanted to keep values `> 50`. – Benjamin Wang May 07 '22 at 18:46
Is there a method that does the opposite of drop? `filter` ? – theonlygusti Apr 28 '23 at 16:47

score 3 · Answer 2 · answered Sep 06 '21 at 07:46

3

you can use df.query()

like:

bool_series = df[my_column] > 50
df.query("@bool_series",inplace=True)

answered Sep 06 '21 at 07:46

Mkelar

31
3

score -1 · Answer 3 · answered Oct 13 '15 at 14:28

-1

I think this works. Maybe there are better ways?

df = df.drop(df[df.my_column < 50].index)

answered Oct 13 '15 at 14:28

Elona Mishmika

480
2
5
21

1

It would still be a copy and replace, but I'm curious as to why you avoided iloc – sapo_cosmico Oct 17 '15 at 13:21
1

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.drop.html drop has an inplace flag – mxfh Sep 12 '18 at 15:02

Pandas best way to subset a dataframe inplace, using a mask

3 Answers3

Linked