I am new to polars and I am not sure whether I am using .with_columns()
correctly.
Here's a situation I encounter frequently:
There's a dataframe and in .with_columns()
, I apply some operation to a column. For example, I convert some dates from str
to date
type and then want to compute the duration between start and end date. I'd implement this as follows.
import polars as pl
pl.DataFrame(
{
"start": ["01.01.2019", "01.01.2020"],
"end": ["11.01.2019", "01.05.2020"],
}
).with_columns(
[
pl.col("start").str.strptime(pl.Date, fmt="%d.%m.%Y"),
pl.col("end").str.strptime(pl.Date, fmt="%d.%m.%Y"),
]
).with_columns(
[
(pl.col("end") - pl.col("start")).alias("duration"),
]
)
First, I convert the two columns, next I call .with_columns()
again.
Something shorter like this does not work:
pl.DataFrame(
{
"start": ["01.01.2019", "01.01.2020"],
"end": ["11.01.2019", "01.05.2020"],
}
).with_columns(
[
pl.col("start").str.strptime(pl.Date, fmt="%d.%m.%Y"),
pl.col("end").str.strptime(pl.Date, fmt="%d.%m.%Y"),
(pl.col("end") - pl.col("start")).alias("duration"),
]
)
Is there a way to avoid calling .with_columns()
twice and to write this in a more compact way?