Suppose I have a simple dataframe as manually generated by the code below:
cols=['a','b','c']
values=['d','e','f']
df=(pl.DataFrame({cols[i]:[values[i]]*3 for i in range(len(cols))})
.with_columns(pl.lit(pl.Series(['a,b','b,c','a,c']))
.alias('Columns to Concatenate'))
)
Which produces a table as below:
a | b | c | Columns to Concatenate |
---|---|---|---|
d | e | f | a,b |
d | e | f | b,c |
d | e | f | a,c |
How would I concatenate all columns as described in the 'Columns to Concatenate' column in order to produce a result like below:
a | b | c | Columns to Concatenate | Concatenated Column String |
---|---|---|---|---|
d | e | f | a,b | de |
d | e | f | b,c | ef |
d | e | f | a,c | df |
I've attempted to do it as such:
(df.with_columns(
pl.concat_str(pl.col('Columns to Concatenate').str.split(','))
.alias('Concatenated Column String'))
)
Which I'm pretty sure is not the correct way of doing this, and is returning a
ComputeError: Cannot cast list type
Would appreciate some pointers on how to do this in an idiomatic and fast way without having to resort to a row-wise lambda function.