I am currently doing a groupby and ranking values in Polars:
let df = df.clone().lazy().select([
all(),
col("value").rank(rank_opts).over(["groupby_id"]).alias("rank")])
.collect().unwrap();
But I am finding it to be pretty slow. I am trying a new method, which I was using in R because it was much faster than ranking, where I sort by value
, group, and then assign the sequence 1:group_size. With R's datatable it looks like this:
data_table[, rank := seq_len(.N), keyby=groupby_id]
Here, .N
calculates the size of the group.
How can I assign a new column which equivalent to 1:group_size for each group?