0

I am trying to create a list of new columns based on the latest column. I can achieve this by using with_columns() and simple multiplication. Given I want a long list of new columns, I am thinking to use a loop with an f-string to do it. However, I am not so sure how to apply f-string into polars column names.

df = pl.DataFrame(
    {
        "id": ["NY", "TK", "FD"], 
        "eat2003": [-9, 3, 8],
        "eat2004": [10, 11, 8]
    }
); df

┌─────┬─────────┬─────────┐
│ id  ┆ eat2003 ┆ eat2004 │
│ --- ┆ ---     ┆ ---     │
│ str ┆ i64     ┆ i64     │
╞═════╪═════════╪═════════╡
│ NY  ┆ -9      ┆ 10      │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
│ TK  ┆ 3       ┆ 11      │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
│ FD  ┆ 8       ┆ 8       │
└─────┴─────────┴─────────┘

(
    df
    .with_columns((pl.col('eat2004') * 2).alias('eat2005'))
    .with_columns((pl.col('eat2005') * 2).alias('eat2006'))
    .with_columns((pl.col('eat2006') * 2).alias('eat2007'))
)

Expected output: 
┌─────┬─────────┬─────────┬─────────┬─────────┬─────────┐
│ id  ┆ eat2003 ┆ eat2004 ┆ eat2005 ┆ eat2006 ┆ eat2007 │
│ --- ┆ ---     ┆ ---     ┆ ---     ┆ ---     ┆ ---     │
│ str ┆ i64     ┆ i64     ┆ i64     ┆ i64     ┆ i64     │
╞═════╪═════════╪═════════╪═════════╪═════════╪═════════╡
│ NY  ┆ -9      ┆ 10      ┆ 20      ┆ 40      ┆ 80      │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
│ TK  ┆ 3       ┆ 11      ┆ 22      ┆ 44      ┆ 88      │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
│ FD  ┆ 8       ┆ 8       ┆ 16      ┆ 32      ┆ 64      │
└─────┴─────────┴─────────┴─────────┴─────────┴─────────┘
codedancer
  • 1,504
  • 9
  • 20

1 Answers1

0

If you can base each of the newest columns from eat2004, I would suggest the following approach:

expr_list = [
    (pl.col('eat2004') * (2**i)).alias(f"eat{2004 + i}")
    for i in range(1, 8)
]

(
    df
    .with_columns(expr_list)
)
shape: (3, 10)
┌─────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┐
│ id  ┆ eat2003 ┆ eat2004 ┆ eat2005 ┆ eat2006 ┆ eat2007 ┆ eat2008 ┆ eat2009 ┆ eat2010 ┆ eat2011 │
│ --- ┆ ---     ┆ ---     ┆ ---     ┆ ---     ┆ ---     ┆ ---     ┆ ---     ┆ ---     ┆ ---     │
│ str ┆ i64     ┆ i64     ┆ i64     ┆ i64     ┆ i64     ┆ i64     ┆ i64     ┆ i64     ┆ i64     │
╞═════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╡
│ NY  ┆ -9      ┆ 10      ┆ 20      ┆ 40      ┆ 80      ┆ 160     ┆ 320     ┆ 640     ┆ 1280    │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
│ TK  ┆ 3       ┆ 11      ┆ 22      ┆ 44      ┆ 88      ┆ 176     ┆ 352     ┆ 704     ┆ 1408    │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
│ FD  ┆ 8       ┆ 8       ┆ 16      ┆ 32      ┆ 64      ┆ 128     ┆ 256     ┆ 512     ┆ 1024    │
└─────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┘

As long as all the Expressions are independent of each other, we can run them in parallel in a single with_columns context (for a nice performance gain). However, if the Expressions are not independent, then they must be run each in successive with_column contexts.

I've purposely created the list of Expressions outside of any query context to demonstrate that Expressions can be generated independent of any query. Later, the list can be supplied to with_columns. This approach helps with debugging and keeping code clean, as you build and test your Expressions.