df = pl.DataFrame(
{"a": [1, 2, 3, 4, 5],
"b": [2, 3, 4, 5, 6],
"x": [1, 3, 5, 7, 9]}
)
df.with_columns(
pl.col('x').cut([2, 4, 6]).alias('x_cut')
)
shape: (5, 4)
┌─────┬─────┬─────┬───────────┐
│ a ┆ b ┆ x ┆ x_cut │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ cat │
╞═════╪═════╪═════╪═══════════╡
│ 1 ┆ 2 ┆ 1 ┆ (-inf, 2] │
│ 2 ┆ 3 ┆ 3 ┆ (2, 4] │
│ 3 ┆ 4 ┆ 5 ┆ (4, 6] │
│ 4 ┆ 5 ┆ 7 ┆ (6, inf] │
│ 5 ┆ 6 ┆ 9 ┆ (6, inf] │
└─────┴─────┴─────┴───────────┘
Old solution
As of 0.16.8
, the top-level function pl.cut
has been deprecated. You have to use the series method .cut
instead now, which returns a three-column DataFrame.
# get x column as a Series and then apply .cut method
df['x'].cut(bins=[2, 4, 6])
It returns a DataFrame like the following:
shape: (5, 3)
┌─────┬─────────────┬─────────────┐
│ x ┆ break_point ┆ category │
│ --- ┆ --- ┆ --- │
│ f64 ┆ f64 ┆ cat │
╞═════╪═════════════╪═════════════╡
│ 1.0 ┆ 2.0 ┆ (-inf, 2.0] │
│ 3.0 ┆ 4.0 ┆ (2.0, 4.0] │
│ 5.0 ┆ 6.0 ┆ (4.0, 6.0] │
│ 7.0 ┆ inf ┆ (6.0, inf] │
│ 9.0 ┆ inf ┆ (6.0, inf] │
└─────┴─────────────┴─────────────┘
If you just want to add the cut categories in your main DataFrame. You can do so in a with_columns()
directly:
df.with_columns(
df['x'].cut(bins=[2, 4, 6], maintain_order=True)['category'].alias('x_cut')
)
# or
df.with_columns(
x_cut=df['x'].cut(bins=[2, 4, 6], maintain_order=True)['category']
)
shape: (5, 4)
┌─────┬─────┬─────┬─────────────┐
│ a ┆ b ┆ x ┆ x_cut │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ cat │
╞═════╪═════╪═════╪═════════════╡
│ 1 ┆ 2 ┆ 1 ┆ (-inf, 2.0] │
│ 2 ┆ 3 ┆ 3 ┆ (2.0, 4.0] │
│ 3 ┆ 4 ┆ 5 ┆ (4.0, 6.0] │
│ 4 ┆ 5 ┆ 7 ┆ (6.0, inf] │
│ 5 ┆ 6 ┆ 9 ┆ (6.0, inf] │
└─────┴─────┴─────┴─────────────┘