I have a python function which takes a polars dataframe, a column name and a default value. The function will return a polars series (with length the same as the number of rows of the dataframe) based on the column name and default value.
- When the column name is None, just return a series of default values.
- When the column name is not None, return that column from dataframe as a series.
And, I want to achieve this with just oneline polars expression.
Below is an example for better illustration.
The function I want has the following signature.
import polars as pl
def f(df, colname=None, value=0):
pass
And below are the behaviors I want to have.
>>> df = pl.DataFrame({"a": [1, 2, 3], "b": [2, 3, 4]})
>>> f(df)
shape: (3,)
Series: '' [i64]
[
0
0
0
]
>>> f(df, "a")
shape: (3,)
Series: '' [i64]
[
1
2
3
]
This is what I tried, basically use polars.when.
def f(df, colname=None, value=0):
return df.select(pl.when(colname is None).then(pl.lit(value)).otherwise(pl.col(colname))).to_series()
But the code errors out when colname is None, with the error message: TypeError: argument 'name': 'NoneType' object cannot be converted to 'PyString'.
Another problem is that the code below runs successfully, but it returns a dataframe with shape (1, 1),
>>> colname = None
>>> value = 0
>>> df.select(pl.when(colname is None).then(pl.lit(value)).otherwise(100))
shape: (1, 1)
┌─────────┐
│ literal │
│ --- │
│ i32 │
╞═════════╡
│ 0 │
└─────────┘
the result I want is a dataframe with shape (3, 1), e.g.,
shape: (3, 1)
┌─────────┐
│ literal │
│ --- │
│ i32 │
╞═════════╡
│ 0 │
│ 0 │
│ 0 │
└─────────┘
What am I supposed to do?