Sorting by particular columns brings together all rows with the same tuple under those columns. I want to cluster all rows with the same value, but keep the groups in the same order in which their first member appeared.
Something like this:
import polars as pl
df = pl.DataFrame(dict(x=[1,0,1,0], y=[3,1,2,4]))
df.cluster('x')
# shape: (4, 2)
# ┌─────┬─────┐
# │ x ┆ y │
# │ --- ┆ --- │
# │ i64 ┆ i64 │
# ╞═════╪═════╡
# │ 1 ┆ 3 │
# ├╌╌╌╌╌┼╌╌╌╌╌┤
# │ 1 ┆ 2 │
# ├╌╌╌╌╌┼╌╌╌╌╌┤
# │ 0 ┆ 1 │
# ├╌╌╌╌╌┼╌╌╌╌╌┤
# │ 0 ┆ 4 │
# └─────┴─────┘