Apply rank with percentile, on python polars, for a set of columns on a dataframe

Question

df = pl.DataFrame(
    {   
        "era": ["01", "01", "02", "02", "03", "03"],
        "pred1": [1, 2, 3, 4, 5,6],
        "pred2": [2,4,5,6,7,8],
        "pred3": [3,5,6,8,9,1],
        "something_else": [5,4,3,67,5,4],
    }
)
pred_cols = ["pred1", "pred2", "pred3"]
ERA_COL = "era"

I'm trying to do an equivalent to pandas rank percentile on Polars. Polars' rank function lacks the pct flag Pandas has.

I looked at another question here: how to replace pandas df.rank(axis=1) with polars

But the results from the question (and applying it to my code), have something off. Calculating rank percentage in Pandas, gives me a single float, the example Polars provided gives me an array, not a float, so something different is being calculated on the example.

As an example, Pandas code is this one:

df[list(pred_cols)] = df.groupby(ERA_COL, group_keys=False).apply(
    lambda d: d[list(pred_cols)].rank(pct=True)
)

jqurious · Accepted Answer · 2022-11-30T15:46:17.260

You can use the .rank() / .count() from the previous question combined with .over()

>>> df.select(
...    (pl.col(pred_cols).rank() / pl.col(pred_cols).count())
...    .over(ERA_COL)
... )
shape: (6, 3)
┌───────┬───────┬───────┐
│ pred1 | pred2 | pred3 │
│ ---   | ---   | ---   │
│ f64   | f64   | f64   │
╞═══════╪═══════╪═══════╡
│ 0.5   | 0.5   | 0.5   │
├───────┼───────┼───────┤
│ 1.0   | 1.0   | 1.0   │
├───────┼───────┼───────┤
│ 0.5   | 0.5   | 0.5   │
├───────┼───────┼───────┤
│ 1.0   | 1.0   | 1.0   │
├───────┼───────┼───────┤
│ 0.5   | 0.5   | 1.0   │
├───────┼───────┼───────┤
│ 1.0   | 1.0   | 0.5   │
└─//────┴─//────┴─//────┘

.with_columns() to "replace" the original values.

>>> df.with_columns(
...    (pl.col(pred_cols).rank() / pl.col(pred_cols).count())
...    .over(ERA_COL)
... )
shape: (6, 5)
┌─────┬───────┬───────┬───────┬────────────────┐
│ era | pred1 | pred2 | pred3 | something_else │
│ --- | ---   | ---   | ---   | ---            │
│ str | f64   | f64   | f64   | i64            │
╞═════╪═══════╪═══════╪═══════╪════════════════╡
│ 01  | 0.5   | 0.5   | 0.5   | 5              │
├─────┼───────┼───────┼───────┼────────────────┤
│ 01  | 1.0   | 1.0   | 1.0   | 4              │
├─────┼───────┼───────┼───────┼────────────────┤
│ 02  | 0.5   | 0.5   | 0.5   | 3              │
├─────┼───────┼───────┼───────┼────────────────┤
│ 02  | 1.0   | 1.0   | 1.0   | 67             │
├─────┼───────┼───────┼───────┼────────────────┤
│ 03  | 0.5   | 0.5   | 1.0   | 5              │
├─────┼───────┼───────┼───────┼────────────────┤
│ 03  | 1.0   | 1.0   | 0.5   | 4              │
└─//──┴─//────┴─//────┴─//────┴─//─────────────┘

Apply rank with percentile, on python polars, for a set of columns on a dataframe

1 Answers1