1

I want to filter a Polars DataFrame and then get the number of rows.

What I'm doing now seems to work but feels so wrong:

    let item_count = item_df
        .lazy()
        .filter(not(col("status").is_in(lit(filter))))
        .collect()?
        .shape().0;

In a subsequent DataFrame operation I need to use this in a division operation

           .with_column(
               col("count")
                   .div(lit(item_count as f64))
                   .mul(lit(100.0))
                   .alias("percentage"),
           );

This is for a tiny dataset (tens of rows) so I'm not worried about performance but I'd like to learn what the best way would be.

Lars Francke
  • 716
  • 7
  • 18

1 Answers1

3

While there doesn't seem to be a predefined method on LazyFrame, you can use polars expressions:

use polars::prelude::*;

let df = df!["a" => [1, 2], "b" => [3, 4]].unwrap();
dbg!(df.lazy().select([count()]).collect().unwrap());
Niklas Mohrin
  • 1,756
  • 7
  • 23
  • and to get the integer value: `df.lazy().select([count().alias("count")]).collect().unwrap().column("count").unwrap().u32().unwrap().get(0).unwrap();` – ecoe Aug 19 '23 at 23:51