2

I would like to only include unique values in my polars Dataframe, based on one column. In the example below I would like to create a new dataframe with only uniques based on the "col_float" column.

Before:

┬───────────┬──────────┬────────────┬────────────┐
┆ col_float ┆ col_bool ┆ col_str    ┆ col_date   │
┆ ---       ┆ ---      ┆ ---        ┆ ---        │
┆ f64       ┆ bool     ┆ str        ┆ date       │
╪═══════════╪══════════╪════════════╪════════════╡
┆ 10.0      ┆ true     ┆ 2020-01-01 ┆ 2020-01-01 │
┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
┆ 20.0      ┆ false    ┆ 2020-01-01 ┆ 2020-01-01 │
┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
┆ 20.0      ┆ true     ┆ 2020-01-01 ┆ 2020-01-01 │
┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
┆ 40.0      ┆ false    ┆ 2020-01-01 ┆ 2020-01-01 │
┴───────────┴──────────┴────────────┴────────────┘

after:

┬───────────┬──────────┬────────────┬────────────┐
┆ col_float ┆ col_bool ┆ col_str    ┆ col_date   │
┆ ---       ┆ ---      ┆ ---        ┆ ---        │
┆ f64       ┆ bool     ┆ str        ┆ date       │
╪═══════════╪══════════╪════════════╪════════════╡
┆ 10.0      ┆ true     ┆ 2020-01-01 ┆ 2020-01-01 │
┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
┆ 20.0      ┆ false    ┆ 2020-01-01 ┆ 2020-01-01 │
┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
┆ 40.0      ┆ false    ┆ 2020-01-01 ┆ 2020-01-01 │
┴───────────┴──────────┴────────────┴────────────┘

(Notice the third row getting dropped because col_float was not unique)

Intuitively, one of my attempts was:

let mut df = pl.DataFrame(
    {
        "col_float": [10.0, 20.0, 20.0, 40.0],
        "col_bool": [True, False, True, False],
        "col_str": pl.repeat("2020-01-01", 4, eager=True),
    };
let mut df2=DataFrame::new(vec![&df[0]]).unwrap();

df= df.unique(df2,UniqueKeepStrategy::First);

but got:

expected `Option<&[String]>`, found `DataFrame`

Which was to be expected beforehand of course.

I'm not sure whether im using to right function and if I do, how this subset should be passed. Searching the documentation or github did not help me as in the examples or code only "None" was passed as the subset.

smitop
  • 4,770
  • 2
  • 20
  • 53
wilaq
  • 51
  • 6

1 Answers1

3

Seemed less of an polars related question, but more related to my experience with Rust.

Working example:

let mut df = pl.DataFrame(
    {
        "col_float": [10.0, 20.0, 20.0, 40.0],
        "col_bool": [True, False, True, False],
        "col_str": pl.repeat("2020-01-01", 4, eager=True),
    };
df= df.unique(Some(&["col_float".to_string()]),UniqueKeepStrategy::First);
wilaq
  • 51
  • 6