I am new to both Polars and Python in general. I have a somewhat unusual problem I could use some help with. I have a dataframe with 50 plus columns that are 0/1. I need to create a new column that contains comma separated list of each column that contains a 1 but using part of the column name. If hccx = 1 then append x to a string column. A simplified example:
df=pl.DataFrame(
{'id':[1,2,3], 'hcc1':[0,1,1],'hcc2':[0,0,1],'hcc5':[0,1,1],'hcc8':[1,0,0]}
)
shape: (3, 5)
┌─────┬──────┬──────┬──────┬──────┐
│ id ┆ hcc1 ┆ hcc2 ┆ hcc5 ┆ hcc8 │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ i64 ┆ i64 │
╞═════╪══════╪══════╪══════╪══════╡
│ 1 ┆ 0 ┆ 0 ┆ 0 ┆ 1 │
│ 2 ┆ 1 ┆ 0 ┆ 1 ┆ 0 │
│ 3 ┆ 1 ┆ 1 ┆ 1 ┆ 0 │
└─────┴──────┴──────┴──────┴──────┘
I want to create a new column (string type), hccall, that looks like the following:
id | hccall |
---|---|
1 | 8 |
2 | 1,5 |
3 | 1,2,5 |
I imagine some type of list comprehension looping over columns that start with 'hcc' would work but I'm kind of stuck. I can create a loop but not sure how to append to the column from within the loop. Any slick ideas?