1
  value  Group  something
0     a    1          1
1     b    1          2
2     c    1          4
3     c    2          9
4     b    2         10
5     x    2          5
6     d    2          3
7     e    3          5
8     d    2         10
9     a    3          5

I want to select the last 3 rows of each group(from the above df) like the following but perform the operation using Inplace. I want to ensure that I am keeping only the new df object in memory after assignment. What would be an efficient way of doing it?

df = df.groupby('Group').tail(3)

The result should look like the following:

  value  Group  something
0     a    1          1
1     b    1          2
2     c    1          4
5     x    2          5
6     d    2          3
7     e    3          5
8     d    2         10
9     a    3          5

N.B:- This question is related to Keeping the last N duplicates in pandas

user3471881
  • 2,614
  • 3
  • 18
  • 34
gibbz00
  • 1,947
  • 1
  • 19
  • 31
  • 4
    Why do you not want to use the you use as an example (`df = df.groupby('Group').tail(3)`)? You can't do an inplace groupby as the grouped dataframe is a fundamentally different object. – johnpaton Nov 26 '18 at 17:06
  • @johnpaton I edited the post a little bit. My goal is to ensure that I am keeping only the new df object in memory after assignment. – gibbz00 Nov 26 '18 at 17:10
  • 1
    When you do overwrite it only have the new one .. – BENY Nov 26 '18 at 17:12
  • 1
    @gibbz00 that happens with the current formulation as well. Python's garbage collection will take care of the old one once there are no more active references to it. – johnpaton Nov 26 '18 at 17:13
  • @W-B Thank you. That answers the question. Can you kindly post that as an answer. – gibbz00 Nov 26 '18 at 17:13
  • @johnpaton Thank you. I did not know its automatically taken care of once all the active references are gone. Can you give an example of an active reference that will make the old df linger? – gibbz00 Nov 26 '18 at 17:15
  • 1
    @gibbz00 giving the output df a new name (`df_grouped = df.groupby('Group').tail(3)`) would mean that `df` still references the old dataframe, whereas `df_grouped` references the new one. Now they will both be stored in memory. – johnpaton Nov 26 '18 at 17:16
  • 1
    better assign new column name like `df['new_col'] = df.groupby('Group').tail(3)` if you dnt want overwrite? – Karn Kumar Nov 26 '18 at 17:17

3 Answers3

1

df = df.groupby('Group').tail(3) is already an efficient way of doing it. Because you are overwriting the df variable, Python will take care of releasing the memory of the old dataframe, and you will only have access to the new one.

johnpaton
  • 715
  • 5
  • 12
1

Trying way too hard to guess what you want.

NOTE: using Pandas inplace argument where it is available is NO guarantee that a new DataFrame won't be created in memory. In fact, it may very well create a new DataFrame in memory and replace the old one behind the scenes.

from collections import defaultdict

def f(s):
  c = defaultdict(int)
  for i, x in zip(s.index[::-1], s.values[::-1]):
    c[x] += 1
    if c[x] > 3:
      yield i

df.drop([*f(df.Group)], inplace=True)
df

  value  Group  something
0     a      1          1
1     b      1          2
2     c      1          4
5     x      2          5
6     d      2          3
7     e      3          5
8     d      2         10
9     a      3          5
piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • I was imagining a solution like this one as drop has a Inplace parameter. However, I did not know `df = df.groupby('Group').tail(3)` already ensures that the old df is released from memory once overwritten. – gibbz00 Nov 26 '18 at 17:22
  • 1
    Yeah, if you aren't concerned with the temporary memory being consumed then released, then you should absolutely use `df.groupby('Group').tail(3)`. You didn't mention performance so I assume it isn't an issue. – piRSquared Nov 26 '18 at 17:24
  • What is the asterisk(*) doing in [*f(df.Group)] ? – gibbz00 Nov 26 '18 at 17:29
  • 1
    `*` in that context is argument unpacking the interable. `[*f(df.Group)]` is a fancy way of doing this `list(f(df.Group))` – piRSquared Nov 26 '18 at 17:31
1

Your answer already into the Post , However as earlier said in the comments you are overwriting the existing df , so to avoid that assign a new column name like below:

df['new_col'] = df.groupby('Group').tail(3)

However, out of curiosity, if you are not concerned about the the groupby and only looking for N last lines of the df yo can do it like below:

df[-2:]   #  last 2 rows
Karn Kumar
  • 8,518
  • 3
  • 27
  • 53