15

I have a pandas dataset that I want to downsize (remove all values under x).

The mask is df[my_column] > 50

I would typically just use df = df[mask], but want to avoid making a copy every time, particularly because it gets error prone when used in functions (as it only gets altered in the function scope).

What is the best way to subset a dataset inplace?

I was thinking of something along the lines of
df.drop(df.loc[mask].index, inplace = True)

Is there a better way to do this, or any situation where this won't work at all?

sapo_cosmico
  • 6,274
  • 12
  • 45
  • 58
  • 1
    You mean `view = df.loc[df[my_column] > 50]`? – EdChum Oct 13 '15 at 13:37
  • I'm always confused by the view vs copy thing in pandas. Essentially I want to give it a condition to drop, and drop inplace. The df.loc[mask].index will give the me indexes to drop, correct? – sapo_cosmico Oct 13 '15 at 13:39
  • 1
    Sorry what's wrong with `df = df[mask]`? this will eventually recover the memory for the dropped rows? – EdChum Oct 13 '15 at 13:40
  • Well `mask` itself is a boolean index – EdChum Oct 13 '15 at 13:40
  • 2
    More error prone, and when used in functions makes a "local" copy, which then has to be returned. I want to do a few alterations in place, not just for memory purposes. `df.drop(df.loc[mask].index, inplace = True)` seems to work, but I expect there might be a better solution (as mine will probably fail on multi-level indexes etc) – sapo_cosmico Oct 13 '15 at 13:42
  • Not sure what you mean by makes a 'local' copy. I'd define df as a global variable, OR make it a class instance. Passing df as an argument to a bunch of functions and then doing changes to df is indeed error-prone. – alex314159 Oct 13 '15 at 14:11

3 Answers3

17

You are missing the inplace parameter :

df.drop(df[df.my_column < 50].index, inplace = True)

Arcyno
  • 4,153
  • 3
  • 34
  • 52
3

you can use df.query()

like:

bool_series = df[my_column] > 50
df.query("@bool_series",inplace=True)
Mkelar
  • 31
  • 3
-1

I think this works. Maybe there are better ways?

df = df.drop(df[df.my_column < 50].index)

Elona Mishmika
  • 480
  • 2
  • 5
  • 21