Polars apply function dosen't pass column name to function

Question

I am migrating my old code from pandas to polars. But i am not able to have a workaround for apply function. I have a function where i do some calculation from the data received from apply function where i have to use some column.

My code In pandas for example:

import pandas as pd

data = {'A': [1, 2, 3], 'B': [4, 5, 6]}
df = pd.DataFrame(data)

self.var1 = 'A'
self.var2 = 'B'

def some_calculation(row):
    var  = self.var1
    var2 = self.var2
    cal1 = row[var] + row[var2]
    cal2 = row[var] * row[var2]
    return list(zip(cal1 , cal2))


df['calc_data'] = df.apply(some_calculation, axis=1)

Now, after going through the polars documentation and using the apply function i found out that it doesn't pass the column name and due to this I am unable to do the calculations as my dataframe may have varying columns and i was not able to find a solution to this.

Please help.

[`pl.Expr.apply`](https://pola-rs.github.io/polars/py-polars/html/reference/expressions/api/polars.Expr.apply.html) should be used instead of [`df.apply`](https://pola-rs.github.io/polars/py-polars/html/reference/dataframe/api/polars.DataFrame.apply.html#polars.DataFrame.apply) - however, in Polars - all cases of `.apply` are generally discouraged. If you show your actual calculation, people can show you how to do it using Polars expressions. — jqurious, Jul 28 '23 at 14:53
I have multiple places where i used df.apply for different calculation for example calculating shifting of dataframe base on shift period of a particular row including positive and negative integer foe example shifting 1st element on a list will shift it to 2nd and 3rd element to -1 to 2nd so boht numbers will addup for result. I am doing that calculation after df['calc_data']. Like wise i have used multiple function where i used df.apply. Also, I have tried pl.Expr.apply but its not passing the column name — Alby, Jul 28 '23 at 14:58
If you need to use multiple columns in `Expr.apply` you can use a struct to pass them all in e.g. https://stackoverflow.com/questions/71658991 - however, what you're describing, it sounds like you can do all this naitvely using Polars Expressions. — jqurious, Jul 28 '23 at 15:12

score 1 · Accepted Answer · answered Jul 28 '23 at 17:10

The solution is to stop thinking in terms of apply.

You can do something like

(
    pl.from_pandas(df)
    .with_columns(
        calc_data= pl.concat_list(
            pl.col(var1)+pl.col(var2), 
            pl.col(var1)*pl.col(var2)
            )
        )
)
shape: (3, 3)
┌─────┬─────┬───────────┐
│ A   ┆ B   ┆ calc_data │
│ --- ┆ --- ┆ ---       │
│ i64 ┆ i64 ┆ list[i64] │
╞═════╪═════╪═══════════╡
│ 1   ┆ 4   ┆ [5, 4]    │
│ 2   ┆ 5   ┆ [7, 10]   │
│ 3   ┆ 6   ┆ [9, 18]   │
└─────┴─────┴───────────┘

although you probably don't actually want the return to be a list so you'd do something more like

(
    pl.from_pandas(df)
    .with_columns(
            calc1=pl.col(var1)+pl.col(var2), 
            calc2=pl.col(var1)*pl.col(var2)
            )
)
shape: (3, 4)
┌─────┬─────┬───────┬───────┐
│ A   ┆ B   ┆ calc1 ┆ calc2 │
│ --- ┆ --- ┆ ---   ┆ ---   │
│ i64 ┆ i64 ┆ i64   ┆ i64   │
╞═════╪═════╪═══════╪═══════╡
│ 1   ┆ 4   ┆ 5     ┆ 4     │
│ 2   ┆ 5   ┆ 7     ┆ 10    │
│ 3   ┆ 6   ┆ 9     ┆ 18    │
└─────┴─────┴───────┴───────┘

If you are adamant about maintaining the anti-pattern that is rowwise iteration with names then you can use iter_rows(named=True) to get a generator of dicts for each row.

Your function, as defined, doesn't work for me so I amended it to this...

var1='A'
var2='B'
def some_calculation(row):
    cal1 = row[var1] + row[var2]
    cal2 = row[var1] * row[var2]
    return [cal1 , cal2]

in which case I get...

df['calc_data'] = df.apply(some_calculation, axis=1)
df
A  B calc_data
0  1  4    [5, 4]
1  2  5   [7, 10]
2  3  6   [9, 18]

Using the generator you can do:

pldf=pl.from_pandas(df)
pldf.with_columns(
    calc_data=pl.Series(some_calculation(x) for x in pldf.iter_rows(named=True))
)
shape: (3, 3)
┌─────┬─────┬───────────┐
│ A   ┆ B   ┆ calc_data │
│ --- ┆ --- ┆ ---       │
│ i64 ┆ i64 ┆ list[i64] │
╞═════╪═════╪═══════════╡
│ 1   ┆ 4   ┆ [5, 4]    │
│ 2   ┆ 5   ┆ [7, 10]   │
│ 3   ┆ 6   ┆ [9, 18]   │
└─────┴─────┴───────────┘

That being said, there's little to be gained by "migrating" to polars if you are going to maintain rowwise iteration.

Thanks Dean, the solution using the generator worked for me. — Alby, Jul 31 '23 at 05:55

Polars apply function dosen't pass column name to function

1 Answers1

Linked