Here is a question where I asked how to conveniently create an expression for long .when().then() chain without hardcoding the values. Alternatives for long .when().then().when().then().otherwise() chains
I wanted to create a new question, as I need to use a custom function in the .when().then() chain, and would like to ask about that one as well.
Having the below dataframe:
import polars as pl
import numpy as np
import datetime as dt
df = pl.DataFrame(
{
"Market":["AT", "AT", "DE", "DE", "CA", "DE", "UK", "US"],
"Number of Days":[1, 2, 3, 4, 3, 4, 2, 1],
}
)
Then having the following parameters from the user in dictionary.
Start Date = Some start date that user defines
Weekmask = Indicating weekdays to be counted e.g. "1111100" Mon - Fri and "1111101" would be Mon-Fri & Sun
In the below case all the markets are having the start date 2023-01-01 except the US is 2023-06-01.
params = {
"AT": {"Start Date":dt.date(2023,1,1), "weekmask":"1111100"},
"DE": {"Start Date":dt.date(2023,1,1), "weekmask":"1111100"},
"CA": {"Start Date":dt.date(2023,1,1), "weekmask":"1111100"},
"UK": {"Start Date":dt.date(2023,1,1), "weekmask":"1111100"},
"US": {"Start Date":dt.date(2023,6,1), "weekmask":"1111111"},
}
Objective is to calculate a new date based on users input and the historical "number of days" in the dataframe but taking account the market specific workdays. For that I'm utilizing np.busday_offset and the below custom function:
def calculate_target_date(days_offset, start_date, weekmask):
# Calculate the target dates with np.busday_offset
dates_array = np.busday_offset(
dates=start_date,
offsets=days_offset,
weekmask=weekmask
)
return pl.Series(dates_array.tolist())
Creating the .when().then() expression by looping through the parameters dictionary.
# Creating the expression
expr = pl.when(False).then(None)
for market, data in params.items():
condition = pl.col("Market") == market
result = pl.col("Number of Days").map(
lambda s: calculate_target_date(
days_offset=s,
start_date=data["Start Date"],
weekmask=data["weekmask"],
)
)
expr = expr.when(condition).then(result)
expr = expr.alias("New Target Date")
And then using the expression in Polars
df.with_columns([expr])
shape: (8, 3)
┌────────┬────────────────┬─────────────────┐
│ Market ┆ Number of Days ┆ New Target Date │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ date │
╞════════╪════════════════╪═════════════════╡
│ AT ┆ 1 ┆ 2023-06-02 │
│ AT ┆ 2 ┆ 2023-06-03 │
│ DE ┆ 3 ┆ 2023-06-04 │
│ DE ┆ 4 ┆ 2023-06-05 │
│ CA ┆ 3 ┆ 2023-06-04 │
│ DE ┆ 4 ┆ 2023-06-05 │
│ UK ┆ 2 ┆ 2023-06-03 │
│ US ┆ 1 ┆ 2023-06-02 │
└────────┴────────────────┴─────────────────┘
Output calculated the busday_offset for all the markets by using the US market "Start Date" (June) rather than using market specific "Start Date" in the parameters. Why is that and how can I make it work properly?