0

[Modified & extended from another question] My data frame has string columns A, L, G. L & G have a Letter with a 2-digit Number. If the A string is "foo" or "spam", the G string in that row should be changed to it's original Letter + L's original Number, and the L string should be changed to "XX".

df = pl.DataFrame(
    {
        "A": ["foo", "ham", "spam", "egg",],
        "L": ["A54", "A12", "B84", "C12"],
        "G": ["X34", "C84", "G96", "L60",],
    }
)
print(df)

shape: (4, 3)
┌──────┬─────┬─────┐
│ A    ┆ L   ┆ G   │
│ ---  ┆ --- ┆ --- │
│ str  ┆ str ┆ str │
╞══════╪═════╪═════╡
│ foo  ┆ A54 ┆ X34 │
│ ham  ┆ A12 ┆ C84 │
│ spam ┆ B84 ┆ G96 │
│ egg  ┆ C12 ┆ L60 │
└──────┴─────┴─────┘

Expected result:

shape: (4, 3)
┌──────┬─────┬─────┐
│ A    ┆ L   ┆ G   │
│ ---  ┆ --- ┆ --- │
│ str  ┆ str ┆ str │
╞══════╪═════╪═════╡
│ foo  ┆ XX  ┆ X54 │
│ ham  ┆ A12 ┆ C84 │
│ spam ┆ XX  ┆ G84 │
│ egg  ┆ C12 ┆ L60 │
└──────┴─────┴─────┘
Des1303
  • 79
  • 4

2 Answers2

1

As your conditional logic is different per column, you'll want to specify the conditionals separately. You can give df.with_columns multiple arguments to achieve this.

This solution assumes that the "letter" part is always one character and the "number" is always two characters

import polars as pl
from polars.testing import assert_frame_equal

df = pl.DataFrame(
    {
        "A": ["foo", "ham", "spam", "egg",],
        "L": ["A54", "A12", "B84", "C12"],
        "G": ["X34", "C84", "G96", "L60",],
    }
)

# factor out the predicate to avoid repeating it 
predicate = pl.when(pl.col("A").is_in(["foo", "spam"]))
result = (
    df.with_columns(
        predicate.then(pl.lit("XX")).otherwise(pl.col("L")).alias("L"),
        predicate.then(
            pl.col("G").str.slice(0, 1) + 
            pl.col("L").str.slice(1, 2)
        ).otherwise(pl.col("G")),
    )
)

# test the result
expected_result = pl.from_repr(
    """
    ┌──────┬─────┬─────┐
    │ A    ┆ L   ┆ G   │
    │ ---  ┆ --- ┆ --- │
    │ str  ┆ str ┆ str │
    ╞══════╪═════╪═════╡
    │ foo  ┆ XX  ┆ X54 │
    │ ham  ┆ A12 ┆ C84 │
    │ spam ┆ XX  ┆ G84 │
    │ egg  ┆ C12 ┆ L60 │
    └──────┴─────┴─────┘
    """
)
assert_frame_equal(result, expected_result)
  • It appears we posted very similar solutions at the same time! But this is a better solution with reuse of `predicate` and `is_in` instead of `contains`. – robertdj Jul 26 '23 at 11:34
  • Yes, almost identical! Shows that the polars API really provides "one obvious way" to do things. Thanks @robertdj – Henry Harbeck Jul 26 '23 at 12:31
1

Here is a solution

(
    df
    .with_columns(
        pl.when(pl.col('A').str.contains('^foo|spam$'))
        .then(pl.col('G').str.slice(0, 1) + pl.col('L').str.slice(1, 2))
        .otherwise(pl.col('G'))
        .alias('G'),

        pl.when(pl.col('A').str.contains('^foo|spam$'))
        .then(pl.lit('XX'))
        .otherwise(pl.col('L'))
        .alias('L'),
    )
)
robertdj
  • 909
  • 1
  • 5
  • 10