I have a dataframe with many columns that have occurances of inf
. I'd like to replace these with null
. All of the column names in question start with the string "ratio_".
This is what I've tried, but I get new columns with the title "literal", when I would like to replace the old ones:
import polars as pl
import numpy as np
df = pl.DataFrame({"label":["a", "b", "c"], "ratio_a":[1.,2.,np.inf]})
df.with_column(
pl.when(pl.col("^ratio_\w+$").is_infinite())
.then(None)
.otherwise(pl.col("^ratio_\w+$")
)
)
shape: (3, 3)
┌───────┬─────────┬─────────┐
│ label ┆ ratio_a ┆ literal │
│ --- ┆ --- ┆ --- │
│ str ┆ f64 ┆ f64 │
╞═══════╪═════════╪═════════╡
│ a ┆ 1.0 ┆ 1.0 │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
│ b ┆ 2.0 ┆ 2.0 │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
│ c ┆ inf ┆ null │
└───────┴─────────┴─────────┘
I could solve this by looping over the column names instead, but I was surprised at the above behaviour. Is there a way to make the above expression work?
import numpy as np
df = pl.DataFrame({"label":["a", "b", "c"], "ratio_a":[1.,2.,np.inf]})
for col in df.columns:
if col.startswith("ratio_"):
df = df.with_column(
pl.when(pl.col(col).is_infinite())
.then(None)
.otherwise(pl.col(col)
).alias(col)
)
df