I will also answer your generic question and not only you specific use case.
For your specific case, as of polars version >= 0.10.18
, the recommended method to create what you want is with the pl.date
or pl.datetime
expression.
Given this dataframe, pl.date
is used to format the date as requested.
import polars as pl
df = pl.DataFrame({
"iyear": [2001, 2001],
"imonth": [1, 2],
"iday": [1, 1]
})
df.with_columns([
pl.date("iyear", "imonth", "iday").dt.strftime("%Y-%m-%d").alias("fmt")
])
This outputs:
shape: (2, 4)
┌───────┬────────┬──────┬────────────┐
│ iyear ┆ imonth ┆ iday ┆ fmt │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ str │
╞═══════╪════════╪══════╪════════════╡
│ 2001 ┆ 1 ┆ 1 ┆ 2001-01-01 │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2001 ┆ 2 ┆ 1 ┆ 2001-02-01 │
└───────┴────────┴──────┴────────────┘
Other ways to collect other columns in a single expression
Below is a more generic answer on the main question. We can use a map
to get multiple columns as Series
, or if we know we want to format a string column, we can use pl.format
. The map
offers most utility.
df.with_columns([
# string fmt over multiple expressions
pl.format("{}-{}-{}", "iyear", "imonth", "iday").alias("date"),
# columnar lambda over multiple expressions
pl.map(["iyear", "imonth", "iday"], lambda s: s[0] + "-" + s[1] + "-" + s[2]).alias("date2"),
])
This outputs
shape: (2, 5)
┌───────┬────────┬──────┬──────────┬──────────┐
│ iyear ┆ imonth ┆ iday ┆ date ┆ date2 │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ str ┆ str │
╞═══════╪════════╪══════╪══════════╪══════════╡
│ 2001 ┆ 1 ┆ 1 ┆ 2001-1-1 ┆ 2001-1-1 │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 2001 ┆ 2 ┆ 1 ┆ 2001-2-1 ┆ 2001-2-1 │
└───────┴────────┴──────┴──────────┴──────────┘
Avoid row-wise operations
Though, the accepted answer is correct in the result. It's not the recommended way to apply operations over multiple columns in polars. Accessing rows is tremendously slow. Incurring a lot of cache misses, needing to run slow python bytecode and killing all parallelization/ query optimization.
Note
In this specific case, the map creating string data is not recommended:
pl.map(["iyear", "imonth", "iday"], lambda s: s[0] + "-" + s[1] + "-" + s[2]).alias("date2"),
. Because the way memory is layed out and because we create a new column per string operation, this is actually quite expensive (Only with string data). Therefore there is the pl.format
and pl.concat_str
.