I would like to be able to clip numerical values in a DataFrame based on the result of an expression on that DataFrame. However, the clip
function only accepts floats or ints, not expr
.
Given the following:
df = pl.DataFrame({'x': [0, 1,2,3,4,5,6,7,8,9,10]})
How would I best clip all values to between the 20th and 80th percentile?
I tried the built-in clip
function first:
df.with_column(
pl.col("x").clip(
min_val = pl.col("x").quantile(0.20),
max_val = pl.col("x").quantile(0.80)
)
.alias("clipped")
)
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: PyErr { type: <class 'RuntimeError'>, value: RuntimeError('BindingsError: "row type not supported <polars.internals.expr.expr.Expr object at 0x0000016F4B3053C0>"'), traceback: None }', src\lazy\dsl.rs:351:53
Traceback (most recent call last):
File "C:\Users\BWT\Anaconda3\envs\tca_ml\lib\site-packages\IPython\core\interactiveshell.py", line 3398, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-19-240f333898af>", line 2, in <cell line: 1>
pl.col("x").clip(
File "C:\Users\BWT\Anaconda3\envs\tca_ml\lib\site-packages\polars\internals\expr\expr.py", line 4840, in clip
return wrap_expr(self._pyexpr.clip(min_val, max_val))
pyo3_runtime.PanicException: called `Result::unwrap()` on an `Err` value: PyErr { type: <class 'RuntimeError'>, value: RuntimeError('BindingsError: "row type not supported <polars.internals.expr.expr.Expr object at 0x0000016F4B3053C0>"'), traceback: None }
The following works and yields the expected results, but is rather ugly and in my opinion:
>>> lower = pl.col("x").quantile(0.20)
>>> upper = pl.col("x").quantile(0.80)
>>> df.with_columns(
[
pl.when(pl.col("x") < lower)
.then(lower))
.when(pl.col("x") > upper)
.then(upper)
.otherwise(pl.col("x"))
.alias("clipped")
]
)
Out[31]:
shape: (11, 2)
┌─────┬─────────┐
│ x ┆ clipped │
│ --- ┆ --- │
│ i64 ┆ f64 │
╞═════╪═════════╡
│ 0 ┆ 2.0 │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
│ 1 ┆ 2.0 │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
│ 2 ┆ 2.0 │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
│ 3 ┆ 3.0 │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
│ ... ┆ ... │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
│ 7 ┆ 7.0 │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
│ 8 ┆ 8.0 │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
│ 9 ┆ 8.0 │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
│ 10 ┆ 8.0 │
└─────┴─────────┘
What would be the best way to do this without making it overly verbose?