4

Consider the following example

zz  = pl.DataFrame({'group' : ['a','a','a','a','b','b','b'],
              'col' : [1,2,3,4,1,3,2]})

zz
Out[16]: 
shape: (7, 2)
┌───────┬─────┐
│ group ┆ col │
│ ---   ┆ --- │
│ str   ┆ i64 │
╞═══════╪═════╡
│ a     ┆ 1   │
│ a     ┆ 2   │
│ a     ┆ 3   │
│ a     ┆ 4   │
│ b     ┆ 1   │
│ b     ┆ 3   │
│ b     ┆ 2   │
└───────┴─────┘

I am trying to create a binned variable by group, essentially replicating a pandas qcut by group. This is easy in Pandas, as shown here:

xx  = pl.DataFrame({'group' : ['a','a','a','a','b','b','b'],
              'col' : [1,2,3,4,1,3,2]}).to_pandas()


xx.groupby('group').col.transform(lambda x: pd.qcut(x, q = 2, labels = False))
Out[18]: 
0    0
1    0
2    1
3    1
4    0
5    1
6    0
Name: col, dtype: int64

But how to do this in Polars? Thanks!

ℕʘʘḆḽḘ
  • 18,566
  • 34
  • 128
  • 235

1 Answers1

1

Update: Series.qcut was added in polars version 0.16.15

As it's not available on expressions as of yet, you could .partition_by

pl.concat(
    frame.get_column("col")
         .qcut([.5], maintain_order=True)
         .select(pl.col("category").to_physical())
    for frame in df.partition_by("group")
)
shape: (7, 1)
┌──────────┐
│ category │
│ ---      │
│ u32      │
╞══════════╡
│ 0        │
│ 0        │
│ 1        │
│ 1        │
│ 0        │
│ 1        │
│ 0        │
└──────────┘

Or .apply in a groupby context:

df.with_columns(
   pl.col("col")
     .apply(
        lambda x: x.qcut([.5], maintain_order=True)["category"].to_physical())
     .over("group")
)
shape: (7, 2)
┌───────┬─────┐
│ group ┆ col │
│ ---   ┆ --- │
│ str   ┆ u32 │
╞═══════╪═════╡
│ a     ┆ 0   │
│ a     ┆ 0   │
│ a     ┆ 1   │
│ a     ┆ 1   │
│ b     ┆ 0   │
│ b     ┆ 1   │
│ b     ┆ 0   │
└───────┴─────┘
jqurious
  • 9,953
  • 1
  • 4
  • 14