Iterate over a subset of a Pandas groupby object

Question

I have a Pandas groupby object, and I would like to iterate over the first n groups. I've tried:

import pandas as pd
df = pd.DataFrame({'A':['a','a','a','b','b','c','c','c','c','d','d'],
                   'B':[1,2,3,4,5,6,7,8,9,10,11]})

df_grouped = df.groupby('A')
i = 0
n = 2 # for instance
for name, group in df_grouped:
    #DO SOMETHING
    if i == n: 
        break
    i += 1

and

group_list = list(df_grouped.groups.keys())[:n]
for name in group_list:
    group = df_grouped.get_group(name)
    #DO SOMETHING

but I wondered if there was a more elegant/pythonic way to do it?

My actual groupby has 1000s of groups within it, and I'd like to only perform an operation on a subset, just to get an impression of the data as a whole.

`df_grouped.ngroup()` is that what your looking for...? eg, you could create a boolean index like `df_grouped.ngroup().le(n)` ... — Chris Adams, May 03 '19 at 13:53
Yes, that's exactly what I want, I hadn't heard of `ngroup()` before — Hannah, May 03 '19 at 14:09

score 2 · Accepted Answer · answered May 03 '19 at 13:54

2

You can filter with your original df, then we can do all the other you need to do

yourdf=df[df.groupby('A').ngroup()<=1]

yourdf=df[pd.factorize(df.A)[0]<=1]

answered May 03 '19 at 13:54

BENY

317,841
20
164
234

Iterate over a subset of a Pandas groupby object

1 Answers1

Linked