How do I fill in missing factors in a polars dataframe?

Question

I have this dataframe:

testdf = pl.DataFrame({'date':['date1','date1','date1','date2','date3','date3'], 'factor':['A','B','C','B','B','C'], 'val':[1,2,3,3,1,5]})

Some of the factors are missing. I'd like to fill in the gaps with values 0. This is what it looks like.

Looks like https://stackoverflow.com/a/75902791/ – jqurious May 25 '23 at 18:17 — jqurious, May 25 '23 at 18:17

score 1 · Answer 1 · edited May 25 '23 at 19:25

1

This is what I have so far (with help from comment below):

(
    testdf
        .select(pl.col(['date','factor']).unique().implode())
        .explode('date')
        .explode('factor')
        .join(testdf, how='left', on=['date','factor'])
        .fill_null(0)
        )

edited May 25 '23 at 19:25

Dean MacGregor

11,847
9
34
72

answered May 25 '23 at 18:26

ste_kwr

820
1
5
21

1

You can clean it up a bit by doing the unique part in a `.select()` e.g. `df.select(pl.col("date", "factor").unique().implode()).explode().explode()` – jqurious May 25 '23 at 18:29
can't seem to get this to work `'Expr' object has no attribute 'implode'` – ste_kwr May 25 '23 at 18:43
1

You need to update your version of polars. It was `.list()` in older verions. – jqurious May 25 '23 at 18:44
1

@ste_kwr another change after your update is that pl.DataFrame wants `schema` instead of `columns` – Dean MacGregor May 26 '23 at 11:00

mishpat · Accepted Answer · 2023-05-25T18:47:47.020

For pure readability/"polars"icity, I think

testdf.pivot(values="val", index="date", columns="factor", aggregate_function="first").melt(id_vars="date", variable_name="factor",value_name="value")

┌───────┬────────┬───────┐
│ date  ┆ factor ┆ value │
│ ---   ┆ ---    ┆ ---   │
│ str   ┆ str    ┆ i64   │
╞═══════╪════════╪═══════╡
│ date1 ┆ A      ┆ 1     │
│ date2 ┆ A      ┆ null  │
│ date3 ┆ A      ┆ null  │
│ date1 ┆ B      ┆ 2     │
│ date2 ┆ B      ┆ 3     │
│ date3 ┆ B      ┆ 1     │
│ date1 ┆ C      ┆ 3     │
│ date2 ┆ C      ┆ null  │
│ date3 ┆ C      ┆ 5     │
└───────┴────────┴───────┘

is good, since it makes most clear what you are trying to do, make the dataframe look like a usual "melted" one. I haven't benchmarked it though.

How do I fill in missing factors in a polars dataframe?

2 Answers2

Linked