The solution is to stop thinking in terms of apply
.
You can do something like
(
pl.from_pandas(df)
.with_columns(
calc_data= pl.concat_list(
pl.col(var1)+pl.col(var2),
pl.col(var1)*pl.col(var2)
)
)
)
shape: (3, 3)
┌─────┬─────┬───────────┐
│ A ┆ B ┆ calc_data │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ list[i64] │
╞═════╪═════╪═══════════╡
│ 1 ┆ 4 ┆ [5, 4] │
│ 2 ┆ 5 ┆ [7, 10] │
│ 3 ┆ 6 ┆ [9, 18] │
└─────┴─────┴───────────┘
although you probably don't actually want the return to be a list so you'd do something more like
(
pl.from_pandas(df)
.with_columns(
calc1=pl.col(var1)+pl.col(var2),
calc2=pl.col(var1)*pl.col(var2)
)
)
shape: (3, 4)
┌─────┬─────┬───────┬───────┐
│ A ┆ B ┆ calc1 ┆ calc2 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═══════╪═══════╡
│ 1 ┆ 4 ┆ 5 ┆ 4 │
│ 2 ┆ 5 ┆ 7 ┆ 10 │
│ 3 ┆ 6 ┆ 9 ┆ 18 │
└─────┴─────┴───────┴───────┘
If you are adamant about maintaining the anti-pattern that is rowwise iteration with names then you can use iter_rows(named=True)
to get a generator of dict
s for each row.
Your function, as defined, doesn't work for me so I amended it to this...
var1='A'
var2='B'
def some_calculation(row):
cal1 = row[var1] + row[var2]
cal2 = row[var1] * row[var2]
return [cal1 , cal2]
in which case I get...
df['calc_data'] = df.apply(some_calculation, axis=1)
df
A B calc_data
0 1 4 [5, 4]
1 2 5 [7, 10]
2 3 6 [9, 18]
Using the generator you can do:
pldf=pl.from_pandas(df)
pldf.with_columns(
calc_data=pl.Series(some_calculation(x) for x in pldf.iter_rows(named=True))
)
shape: (3, 3)
┌─────┬─────┬───────────┐
│ A ┆ B ┆ calc_data │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ list[i64] │
╞═════╪═════╪═══════════╡
│ 1 ┆ 4 ┆ [5, 4] │
│ 2 ┆ 5 ┆ [7, 10] │
│ 3 ┆ 6 ┆ [9, 18] │
└─────┴─────┴───────────┘
That being said, there's little to be gained by "migrating" to polars if you are going to maintain rowwise iteration.