1

I am working with polars and would like to create a Dataframe from various expression outputs such as mean and median and std etc.

    let series_a = Series::new("ID", vec![1, 2, 3, 4]);
    let series_b = Series::new("Amount", vec![10.0, 22.0, 13.3, 54.1]);
    let series_c = Series::new("Name", vec!["Item 1", "Item 2", "Item 3", "Item 4"]);

    let dataframe = DataFrame::new(vec![series_a, series_b, series_c]).unwrap();
    let lazyframe = dataframe.lazy();

    let mean = lazyframe
        .clone()
        .select([col("*").mean()])
        .collect()
        .unwrap();

    let median = lazyframe
        .clone()
        .select([col("*").median()])
        .collect()
        .unwrap();

    let mut mean_transpose = mean.transpose().unwrap();
    let mut median_transpose = median.transpose().unwrap();

    mean_transpose
        .set_column_names(&["mean"])
        .expect("could not set column names");
    median_transpose
        .set_column_names(&["median"])
        .expect("could not set column names");

    let new_df = mean_transpose
        .hstack(&median_transpose.get_columns())
        .unwrap();

    println!("{:?}", new_df);

and this produces the desired output of

┌──────┬────────┐
│ mean ┆ median │
│ ---  ┆ ---    │
│ str  ┆ str    │
╞══════╪════════╡
│ 3.5  ┆ 3.5    │
├╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 25.8 ┆ 13.3   │
├╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ null ┆ null   │
└──────┴────────┘

However, I am sure that there is a simpler way to do it, especially when I start using 8-9 of these aggregate expressions and the code gets very complex and ugly.

so far I have managed to create something close to what I need but is not quite there yet.

       let test = lazyframe
        .clone()
        .select([
            col("*").median().map_alias(|f| f.to_owned() + &" median"),
            col("*").mean().map_alias(|f| f.to_owned() + &" mean"),
        ])
        .collect()
        .unwrap();

which produces the output

┌───────────┬───────────────┬─────────────┬─────────┬─────────────┬───────────┐
│ ID median ┆ Amount median ┆ Name median ┆ ID mean ┆ Amount mean ┆ Name mean │
│ ---       ┆ ---           ┆ ---         ┆ ---     ┆ ---         ┆ ---       │
│ f64       ┆ f64           ┆ str         ┆ f64     ┆ f64         ┆ str       │
╞═══════════╪═══════════════╪═════════════╪═════════╪═════════════╪═══════════╡
│ 2.5       ┆ 17.65         ┆ null        ┆ 2.5     ┆ 24.85       ┆ null      │
└───────────┴───────────────┴─────────────┴─────────┴─────────────┴───────────┘

Which is close to what I need but is not quite there yet. I ideally want to have means and median groups per header, like in the previous output. How would I accomplish this?

Kival M
  • 182
  • 1
  • 10

0 Answers0