0

I need to do some somewhat complicated processing for each group after grouping. in pandas, it can be writed as follows:

for i,g in df.groupby(['id','sid']):
    pass

While in polars, the groups function returns a DataFrame, But this cannot be conveniently applied to for loops.

lemmingxuan
  • 549
  • 1
  • 7
  • 18
  • `df.groupby(['id','sid']).groups.keys()` this will give list of keys, is this you are asking – Vignesh May 12 '22 at 05:22
  • 1
    `df.groupby().apply()` is a *very* convenient way to do whatever you think you need a for loop for. I can almost guarantee that a for loop is not the best approach... – BeRT2me May 12 '22 at 05:28
  • AttributeError: 'function' object has no attribute 'keys' @ Vignesh is your suggestion is about polars? – lemmingxuan May 12 '22 at 06:51
  • I try to rewrite with apply in `pandas` but it can not stop by condition(when in for loop, I can count the time of iteration and stop it for testing the performance of code ), my data has 1 million and use a subset of data to test the performance is not suitable for the indicator which I need to construct, so I want to write it with `pandas` first then rewrite it by `polars`.@BeRT2me – lemmingxuan May 12 '22 at 10:40

1 Answers1

1

You could use partition by. This would yield a dictionary where the groupby keys map to the partitioned DataFrames.

df = pl.DataFrame({
    "groups": [1, 1, 2, 2, 2],
    "values": pl.arange(0, 5, eager=True)
})

part_dfs = df.partition_by("groups", as_dict=True)

print(part_dfs)
{1: shape: (2, 2)
┌────────┬────────┐
│ groups ┆ values │
│ ---    ┆ ---    │
│ i64    ┆ i64    │
╞════════╪════════╡
│ 1      ┆ 0      │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 1      ┆ 1      │
└────────┴────────┘,
 2: shape: (3, 2)
┌────────┬────────┐
│ groups ┆ values │
│ ---    ┆ ---    │
│ i64    ┆ i64    │
╞════════╪════════╡
│ 2      ┆ 2      │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 2      ┆ 3      │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 2      ┆ 4      │
└────────┴────────┘}

ritchie46
  • 10,405
  • 1
  • 24
  • 43