Polars: return dataframe with all unique values of N columns

Question

I have a dataframe that has many rows per combination of the 'PROGRAM', 'VERSION' and 'RELEASE_DATE' columns. I want to get a dataframe with all of the combinations of just those three columns. Would this be a job for groupby or distinct?

thx

score 6 · Accepted Answer · edited Feb 19 '23 at 17:54

6

Since you are not aggregating anything, use unique

df.select(['PROGRAM','VERSION','RELEASE_DATE']).unique()

If you are not using the Lazy functionality of Polars, this can also be written as:

df[['PROGRAM','VERSION','RELEASE_DATE']].unique()

edited Feb 19 '23 at 17:54

Anton Daneyko

6,528
5
31
59

answered Mar 07 '22 at 19:47

Can the select version posted above be iterated on? – rchitect-of-info Mar 08 '22 at 12:27
`for prog, vers, rel in df.select(['PROGRAM','VERSION','RELEASE_DATE']).distinct().rows(): ...` – Mar 08 '22 at 19:22
1

`distinct()` has been deprecated, you should use `unique()` instead – magomar Dec 07 '22 at 09:31

Polars: return dataframe with all unique values of N columns

1 Answers1