0

Here is a question where I asked how to conveniently create an expression for long .when().then() chain without hardcoding the values. Alternatives for long .when().then().when().then().otherwise() chains

I wanted to create a new question, as I need to use a custom function in the .when().then() chain, and would like to ask about that one as well.

Having the below dataframe:

import polars as pl
import numpy as np
import datetime as dt


df = pl.DataFrame(
    {
        "Market":["AT", "AT", "DE", "DE", "CA", "DE", "UK", "US"],
        "Number of Days":[1, 2, 3, 4, 3, 4, 2, 1],
    }
)

Then having the following parameters from the user in dictionary.

Start Date = Some start date that user defines

Weekmask = Indicating weekdays to be counted e.g. "1111100" Mon - Fri and "1111101" would be Mon-Fri & Sun

In the below case all the markets are having the start date 2023-01-01 except the US is 2023-06-01.

params = {
    "AT": {"Start Date":dt.date(2023,1,1), "weekmask":"1111100"},
    "DE": {"Start Date":dt.date(2023,1,1), "weekmask":"1111100"},
    "CA": {"Start Date":dt.date(2023,1,1), "weekmask":"1111100"},
    "UK": {"Start Date":dt.date(2023,1,1), "weekmask":"1111100"},
    "US": {"Start Date":dt.date(2023,6,1), "weekmask":"1111111"},
}

Objective is to calculate a new date based on users input and the historical "number of days" in the dataframe but taking account the market specific workdays. For that I'm utilizing np.busday_offset and the below custom function:

def calculate_target_date(days_offset, start_date, weekmask):
    # Calculate the target dates with np.busday_offset
    dates_array = np.busday_offset(
        dates=start_date,
        offsets=days_offset,
        weekmask=weekmask
        )
    return pl.Series(dates_array.tolist())

Creating the .when().then() expression by looping through the parameters dictionary.

# Creating the expression
expr = pl.when(False).then(None) 
for market, data in params.items():
   condition = pl.col("Market") == market
   result = pl.col("Number of Days").map(
       lambda s: calculate_target_date(
           days_offset=s,
           start_date=data["Start Date"],
           weekmask=data["weekmask"],
       )
   )
   expr = expr.when(condition).then(result)

expr = expr.alias("New Target Date")

And then using the expression in Polars

df.with_columns([expr])
shape: (8, 3)
┌────────┬────────────────┬─────────────────┐
│ Market ┆ Number of Days ┆ New Target Date │
│ ---    ┆ ---            ┆ ---             │
│ str    ┆ i64            ┆ date            │
╞════════╪════════════════╪═════════════════╡
│ AT     ┆ 1              ┆ 2023-06-02      │
│ AT     ┆ 2              ┆ 2023-06-03      │
│ DE     ┆ 3              ┆ 2023-06-04      │
│ DE     ┆ 4              ┆ 2023-06-05      │
│ CA     ┆ 3              ┆ 2023-06-04      │
│ DE     ┆ 4              ┆ 2023-06-05      │
│ UK     ┆ 2              ┆ 2023-06-03      │
│ US     ┆ 1              ┆ 2023-06-02      │
└────────┴────────────────┴─────────────────┘

Output calculated the busday_offset for all the markets by using the US market "Start Date" (June) rather than using market specific "Start Date" in the parameters. Why is that and how can I make it work properly?

miroslaavi
  • 361
  • 2
  • 7
  • 2
    They're all the same due to: https://docs.python.org/3/faq/programming.html#why-do-lambdas-defined-in-a-loop-with-different-values-all-return-the-same-result - you can pass data into the lambda e.g. `lambda s, data=data: ...` – jqurious Jun 26 '23 at 11:46

1 Answers1

0

Thanks @jqurious,

so it is not to do with Polars .when().then() but Python fundamentals that I was not aware of. After changing the lambda part of the code to take in the parameters, the code works as expected.

# Creating the expression
expr = pl.when(False).then(None)
for market, data in params.items():
    condition = pl.col("Market") == market
    result = pl.col("Number of Days").map(
        lambda s, start_date = data["Start Date"], weekmask = data["weekmask"]: calculate_target_date(s,
            start_date=start_date,
            weekmask=weekmask,
        )
    )
    expr = expr.when(condition).then(result)

expr = expr.alias("New Target Date")
miroslaavi
  • 361
  • 2
  • 7