4

Are there some clever alternatives writing long when().then().otherwise() chains without hardcoding the values, see the example below:

Let's say we have the following dataframe

df = pl.DataFrame(
    {
        "Market":["AT", "AT", "DE", "DE", "CA", "DE", "UK", "US"],
        "Number of Days":[1, 2, 3, 4, 3, 4, 2, 1],
        
    }
)

User defines some conditions as a dictionary for different countries

params = {
    "AT":{"Value": 1},
    "DE":{"Value": 2},
    "CA":{"Value": 3},
    "UK":{"Value": 1},
    "US":{"Value": 2}
}

Then I hard-code the countries and use the countries in the Polars .with_columns() as below:

(
    df
    .with_columns(
        [
            pl.when(pl.col("Market") == "AT").then(pl.col("Number of Days") + params["AT"]["Value"])
            .when(pl.col("Market") == "DE").then(pl.col("Number of Days") + params["DE"]["Value"])
            .when(pl.col("Market") == "CA").then(pl.col("Number of Days") + params["CA"]["Value"])
            .when(pl.col("Market") == "UK").then(pl.col("Number of Days") + params["UK"]["Value"])
            .when(pl.col("Market") == "US").then(pl.col("Number of Days") + params["US"]["Value"])
            .otherwise(None)
            .alias("New Column")
        ]
    )
)

Is there a way for me not to hard-code the countries in .with_columns, but somehow loop through the dictionary and create expression based on the values provided?¨

I tried the below but it says I have duplicate column names.

exprs = []
for market, data in params.items():
    condition = (pl.col("Market") == market)
    result = (pl.col("Number of Days") + data["Value"])
    expr = pl.when(condition).then(result)
    exprs.append(expr)

df.with_columns(exprs)
miroslaavi
  • 361
  • 2
  • 7

2 Answers2

3

You can use map_dict

(
    df
        .with_columns((
            pl.col("Number of Days") + 
            pl.col("Market")
                .map_dict({x:y['Value'] for x,y in params.items()})
            )
            .alias("New Column")
        )
)

Alternatively you can make params its own df and then join them. map_dict works by converting your dict into a df and then joining them but depending on where your params are coming from for real this might be preferred.

paramsdf = pl.DataFrame({'Market':params.keys(), 
                         'Value':[x['Value'] for x in params.values()]})
(
    df.join(paramsdf, on="Market")
        .with_columns((pl.col("Number of Days")+ pl.col("Value")).alias("New Column"))
        .drop("Value")
)
Dean MacGregor
  • 11,847
  • 9
  • 34
  • 72
2

Your for loop is creating separate pl.when() objects, hence the duplicates error.

You can pass exprs to pl.coalesce

df.with_columns(pl.coalesce(exprs))
shape: (8, 2)
┌────────┬────────────────┐
│ Market ┆ Number of Days │
│ ---    ┆ ---            │
│ str    ┆ i64            │
╞════════╪════════════════╡
│ AT     ┆ 2              │
│ AT     ┆ 3              │
│ DE     ┆ 5              │
│ DE     ┆ 6              │
│ CA     ┆ 6              │
│ DE     ┆ 6              │
│ UK     ┆ 3              │
│ US     ┆ 3              │
└────────┴────────────────┘

You can also create a single .when().then() chain:

expr = pl.when(False).then(None) # "empty"

for market, data in params.items():
   condition = (pl.col("Market") == market)
   result = (pl.col("Number of Days") + data["Value"])
   expr = expr.when(condition).then(result)
df.with_columns(expr)
shape: (8, 3)
┌────────┬────────────────┬─────────┐
│ Market ┆ Number of Days ┆ literal │
│ ---    ┆ ---            ┆ ---     │
│ str    ┆ i64            ┆ i64     │
╞════════╪════════════════╪═════════╡
│ AT     ┆ 1              ┆ 2       │
│ AT     ┆ 2              ┆ 3       │
│ DE     ┆ 3              ┆ 5       │
│ DE     ┆ 4              ┆ 6       │
│ CA     ┆ 3              ┆ 6       │
│ DE     ┆ 4              ┆ 6       │
│ UK     ┆ 2              ┆ 3       │
│ US     ┆ 1              ┆ 3       │
└────────┴────────────────┴─────────┘

Another option is to modify params so you have a flat dict and use .map_dict

params_kv = { k: list(v.values()).pop() for k, v in params.items() }

df.with_columns(value = pl.col('Market').map_dict(params_kv))
shape: (8, 3)
┌────────┬────────────────┬───────┐
│ Market ┆ Number of Days ┆ value │
│ ---    ┆ ---            ┆ ---   │
│ str    ┆ i64            ┆ i64   │
╞════════╪════════════════╪═══════╡
│ AT     ┆ 1              ┆ 1     │
│ AT     ┆ 2              ┆ 1     │
│ DE     ┆ 3              ┆ 2     │
│ DE     ┆ 4              ┆ 2     │
│ CA     ┆ 3              ┆ 3     │
│ DE     ┆ 4              ┆ 2     │
│ UK     ┆ 2              ┆ 1     │
│ US     ┆ 1              ┆ 2     │
└────────┴────────────────┴───────┘

Which is shorthand for turning the dict into a dataframe and performing a .join()

jqurious
  • 9,953
  • 1
  • 4
  • 14