pandas - how to get last n groups of a groupby object and combine them as a dataframe

Question

How to get last 'n' groups after df.groupby() and combine them as a dataframe.

data = pd.read_sql_query(sql=sqlstr, con=sql_conn, index_col='SampleTime')
grouped = data.groupby(data.index.date,sort=False)

After doing grouped.ngroups i am getting total number of groups 277. I want to combine last 12 groups and generate a dataframe.

jpp · Accepted Answer · 2018-10-20T11:36:12.537

11

Pandas GroupBy objects are iterables. To extract the last n elements of an iterable, there's generally no need to create a list from the iterable and slice the last n elements. This will be memory-expensive.

Instead, you can use either itertools.islice (as suggested by @mtraceur) or collections.deque. Both work in O(n) time.

`itertools.islice`

Unlike a generator, a Pandas GroupBy object is an iterable which can be reused. Therefore, you can calculate the number of groups via len(g) for a GroupBy object g and then slice g via islice. Or, perhaps more idiomatic, you can use GroupBy.ngroups. Then use pd.concat to concatenate an iterable of dataframes:

from operator import itemgetter

g = data.groupby(data.index.date, sort=False)
res = pd.concat(islice(map(itemgetter(1), g), max(0, g.ngroups-12), None))

`collections.deque`

Alternatively, you can use collections.deque and specify maxlen, then concatenate as before.

from collections import deque

grouped = data.groupby(data.index.date, sort=False)
res = pd.concat(deque(map(itemgetter(1), grouped), maxlen=12))

As described in the collections docs:

Once a bounded length deque is full, when new items are added, a corresponding number of items are discarded from the opposite end.... They are also useful for tracking transactions and other pools of data where only the most recent activity is of interest.

edited Oct 20 '18 at 11:36

answered Oct 19 '18 at 15:47

jpp

159,742
34
281
339

in your example where can it be seen that you recover the last 5? – Yuca Oct 19 '18 at 15:48
Great catch using deques, but you are still iterating over all groups. So the advantage is to save memory in this case, am I right? Good catch anyway – rafaelc Oct 19 '18 at 15:52
@RafaelC, Yes, the advantage is principally lower memory usage. You can't avoid iterating over all groups. – jpp Oct 19 '18 at 15:54
@Yuca, `maxlen=12` here. – jpp Oct 19 '18 at 15:56
1

I'm curious, did you also consider/evaluate doing `itertools.islice(map(...), len(grouped) - 12, len(grouped)` instead of a `collections.deque(map(...), maxlen=12)`? My intuition is to prefer `islice` in this case, but maybe I'm missing something? – mtraceur Oct 19 '18 at 21:49
1

@mtraceur, I've gone ahead and added the `islice` solution (but using `ngroups`), hope that's ok. – jpp Oct 19 '18 at 22:56

score 2 · Answer 2 · answered Oct 19 '18 at 15:22

2

Assuming you know the order of grouped

grouped = zip(*df.groupby(data.index.date,sort=False))
pd.concat(list(grouped)[1][-12:])

answered Oct 19 '18 at 15:22

rafaelc

57,686
15
58
82

Yuca · Answer 3 · 2018-10-19T15:35:15.943

1

use pd.concat on lists comprehension and groupby.get_group

pd.concat([grouped.get_group(x) for x in list(grouped.groups.keys())[-12:]])

edited Oct 19 '18 at 15:35

answered Oct 19 '18 at 15:18

Yuca

6,010
3
22
42

I just tried this on a dataframe I was working on, and this seemed to be what OP asked for? `ed:` the concat didn't work, but `.tail(12)` returned the final 12 groups – Mathew Savage Oct 19 '18 at 15:22
new version should be aligned to what OP wants :) (although it doesn't provide much vs rahlf23's version) – Yuca Oct 19 '18 at 15:41
@MathewSavage that is really interesting, doc says it return the last rows from each group, not the last groups. Maybe I should test it too – Yuca Oct 19 '18 at 19:05

score 0 · Answer 4 · answered Oct 19 '18 at 15:25

You could pass a list comprehension to pd.concat():

import pandas as pd

df = pd.DataFrame([
['A',1,2],
['A',7,6],
['B',1,3],
['B',9,9],
['C',1,8],
['A',4,3],
['C',7,6],
['D',4,2]],
columns=['Var','Val1','Val2'])

last_n = 2
grouped = df.groupby('Var')

pd.concat([grouped.get_group(group) for i, group in enumerate(grouped.groups) if i>=len(grouped)-last_n])

Yields:

  Var  Val1  Val2
4   C     1     8
6   C     7     6
7   D     4     2

score 0 · Answer 5 · answered Jun 02 '22 at 21:19

0

Get last n keys and filter dataframe with those keys

n = 12
grouped = data.groupby(data.index.date)
keys = list(grouped.groups.keys())
last_n_groups = data.loc[data[data.index.date].isin(keys[-n:])]

answered Jun 02 '22 at 21:19

Deepak Soni

29
5

pandas - how to get last n groups of a groupby object and combine them as a dataframe

5 Answers5

itertools.islice

collections.deque

`itertools.islice`

`collections.deque`