0

After splitting a string in multiple 'words', I want to add a new column with the amount of counted items .alias("count").

let df = df! [
        "keys" => ["a ab", "a ab abc", "b ba abc abcd", "b ba bbc abcd bbcd"],
        "groups" => ["A", "A", "B", "C"],
    ]?;

First I split the string:

let out = df.lazy().with_column(col("keys").str().split(" "));

And attempt the count:

let out_2 = out.with_columns([col("keys")
      .apply(|s| Ok(s.len()), GetOutput::from_type(DataType::Int32))
      .alias("count")]).collect().unwrap();

Which results in error message:

mismatched types
expected struct `polars::prelude::Series`, found `usize`

No idea how to proceed.

cafce25
  • 15,907
  • 4
  • 25
  • 31
fvg
  • 153
  • 3
  • 9

1 Answers1

1

You can use the .arr() method to get a ListNameSpace, which provides lengths.

let out_2 = out
        .with_columns([col("keys").arr().lengths().alias("count")])
        .collect()
        .unwrap();
┌─────────────────────────┬────────┬───────┐
│ keys                    ┆ groups ┆ count │
│ ---                     ┆ ---    ┆ ---   │
│ list[str]               ┆ str    ┆ u32   │
╞═════════════════════════╪════════╪═══════╡
│ ["a", "ab"]             ┆ A      ┆ 2     │
│ ["a", "ab", "abc"]      ┆ A      ┆ 3     │
│ ["b", "ba", ... "abcd"] ┆ B      ┆ 4     │
│ ["b", "ba", ... "bbcd"] ┆ C      ┆ 5     │
└─────────────────────────┴────────┴───────┘
BallpointBen
  • 9,406
  • 1
  • 32
  • 62
  • Do you have any idea why your solution stopped working in Polars 0.29 (and perhaps even before that version)? – fvg May 08 '23 at 13:24