I am working with polars and would like to create a Dataframe from various expression outputs such as mean
and median
and std
etc.
let series_a = Series::new("ID", vec![1, 2, 3, 4]);
let series_b = Series::new("Amount", vec![10.0, 22.0, 13.3, 54.1]);
let series_c = Series::new("Name", vec!["Item 1", "Item 2", "Item 3", "Item 4"]);
let dataframe = DataFrame::new(vec![series_a, series_b, series_c]).unwrap();
let lazyframe = dataframe.lazy();
let mean = lazyframe
.clone()
.select([col("*").mean()])
.collect()
.unwrap();
let median = lazyframe
.clone()
.select([col("*").median()])
.collect()
.unwrap();
let mut mean_transpose = mean.transpose().unwrap();
let mut median_transpose = median.transpose().unwrap();
mean_transpose
.set_column_names(&["mean"])
.expect("could not set column names");
median_transpose
.set_column_names(&["median"])
.expect("could not set column names");
let new_df = mean_transpose
.hstack(&median_transpose.get_columns())
.unwrap();
println!("{:?}", new_df);
and this produces the desired output of
┌──────┬────────┐
│ mean ┆ median │
│ --- ┆ --- │
│ str ┆ str │
╞══════╪════════╡
│ 3.5 ┆ 3.5 │
├╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 25.8 ┆ 13.3 │
├╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ null ┆ null │
└──────┴────────┘
However, I am sure that there is a simpler way to do it, especially when I start using 8-9 of these aggregate expressions and the code gets very complex and ugly.
so far I have managed to create something close to what I need but is not quite there yet.
let test = lazyframe
.clone()
.select([
col("*").median().map_alias(|f| f.to_owned() + &" median"),
col("*").mean().map_alias(|f| f.to_owned() + &" mean"),
])
.collect()
.unwrap();
which produces the output
┌───────────┬───────────────┬─────────────┬─────────┬─────────────┬───────────┐
│ ID median ┆ Amount median ┆ Name median ┆ ID mean ┆ Amount mean ┆ Name mean │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ f64 ┆ f64 ┆ str ┆ f64 ┆ f64 ┆ str │
╞═══════════╪═══════════════╪═════════════╪═════════╪═════════════╪═══════════╡
│ 2.5 ┆ 17.65 ┆ null ┆ 2.5 ┆ 24.85 ┆ null │
└───────────┴───────────────┴─────────────┴─────────┴─────────────┴───────────┘
Which is close to what I need but is not quite there yet. I ideally want to have means and median groups per header, like in the previous output. How would I accomplish this?