1

Is there a simple way to manually iterate through existing pandas groupby objects?

import pandas as pd

df = pd.DataFrame({'x': [0, 1, 2, 3, 4], 'category': ['A', 'A', 'B', 'B', 'B']})
grouped = df.groupby('category')

In the application a for name, group in grouped: loops follows. For manual-testing I would like to do something like group = grouped[0] and run the code within the for-loop. Unfortunately this does not work. The best thing I could find (here) was

group = df[grouped.ngroup()==0]

which relies on the original DataFrame and not soley on the groupby-Object and is therefore not optimal imo.

Qaswed
  • 3,649
  • 7
  • 27
  • 47
  • How about `get_group` as in this [answer](https://stackoverflow.com/a/40630950/5276797)? – IanS Jul 30 '19 at 08:00
  • @IanS it helps when I know the name of the groups. But it would we way more convenient just to pass a number. – Qaswed Jul 30 '19 at 08:06
  • 1
    can you create a dict of groups with a factor, example `d={f"group{i}":g for i,g in df.groupby(df.category.factorize()[0])}` and then call each group like `d['group0']` – anky Jul 30 '19 at 08:12
  • @anky_91 So you suggest to build a second groupby object using `factorize()`, right? Is it possible to use the existing groupby-object to build such a dictionary? – Qaswed Jul 30 '19 at 08:38
  • @Qaswed the dict in my comment is using the dataframe directly, not a second groupby. :) – anky Jul 30 '19 at 09:03
  • 1
    You can try `unique_cats = df["category"].unique()` and then `df[df["category"] == unique_cats[0]]` and get the result since while using `df.groupby("col").apply(myfunction)` does the same thing iteratively. So there will be no difference. – Ilker Kurtulus Jul 30 '19 at 11:14
  • @anky_91 By "second groupby", I mean that it doesn't use the existing `grouped`. – Qaswed Jul 30 '19 at 12:35
  • @Qaswed the answer in the comment doesnot use the `grouped` it uses the dataframe from the scratch and does a groupby – anky Jul 30 '19 at 12:36
  • @anky_91 Thank you for your suggestion to build the groupby object differently. I clarified my question that I want to know, how one best iterates through **existing** groupby objects. – Qaswed Jul 31 '19 at 07:04
  • 1
    what do you mean by "manually iterate" *exactly*. you showed an *indexing* operation. Just create a list out of your groupby object. If you want to access it by group label, create a dict. – juanpa.arrivillaga Jul 31 '19 at 07:15
  • In the end it's unclear to me why you need this at all. Since you're iterating manually, why not select a group by name rather than by some arbitrary index? A group is just a subset of the dataframe anyway. – IanS Jul 31 '19 at 08:38
  • And if you must absolutely use an index, then run the loop `for name, group in grouped` and break when you have reached the number of iterations you're interested in. – IanS Jul 31 '19 at 08:40

1 Answers1

2

Any iterable (here the GroupBy object) can be turned into an iterator:

group_iter = iter(grouped)

The line below will be the equivalent of selecting the first group (indexed by 0):

name, group = next(group_iter)

To get the next group, just repeat:

name, group = next(group_iter)

And so on...


Source: https://treyhunner.com/2018/02/python-range-is-not-an-iterator/

IanS
  • 15,771
  • 9
  • 60
  • 84