I have a dataset of time series data similar to the following:
let series_one = Series::new(
"a",
(0..4).into_iter().map(|v| v as f64).collect::<Vec<_>>(),
);
let series_two = Series::new(
"b",
(4..8).into_iter().map(|v| v as f64).collect::<Vec<_>>(),
);
let series_three = Series::new(
"c",
(8..12).into_iter().map(|v| v as f64).collect::<Vec<_>>(),
);
let series_dates = Series::new(
"date",
(0..4)
.into_iter()
.map(|v| NaiveDate::default() + Duration::days(2 * v))
.collect::<Vec<_>>(),
);
let df = DataFrame::new(vec![series_one, series_two, series_three, series_dates]).unwrap();
Which has the following shape:
shape: (4, 4)
┌─────┬─────┬──────┬────────────┐
│ a ┆ b ┆ c ┆ date │
│ --- ┆ --- ┆ --- ┆ --- │
│ f64 ┆ f64 ┆ f64 ┆ date │
╞═════╪═════╪══════╪════════════╡
│ 0.0 ┆ 4.0 ┆ 8.0 ┆ 1970-01-01 │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 1.0 ┆ 5.0 ┆ 9.0 ┆ 1970-01-02 │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2.0 ┆ 6.0 ┆ 10.0 ┆ 1970-01-03 │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 3.0 ┆ 7.0 ┆ 11.0 ┆ 1970-01-04 │
└─────┴─────┴──────┴────────────┘
I would like to apply some function which operates on a slice of the dataframe that contains all previous rows for every row in the dataframe.
If I have some function some_fn
:
fn some_fn(_df: DataFrame) -> DataFrame {
// Do some operation with the dataframe slice that doesn't need to mutate any data and returns a
// new dataframe with some results
DataFrame::new(vec![
Series::new("a_result", vec![1.0, 2.0, 3.0, 4.0]),
Series::new("b_result", vec![5.0, 6.0, 7.0, 8.0]),
Series::new("c_result", vec![9.0, 10.0, 11.0, 12.0]),
])
.unwrap()
}
and I attempt to do the following:
let size = df.column("a").unwrap().len();
let results = (0..size)
.into_iter()
.map(|i| {
let t = df.head((i + 1).into());
some_fn(t)
})
.reduce(|acc, b| acc.vstack(&b).unwrap())
.unwrap();
I find that it is exceedingly slow, taking about 1ms to process just 3000 rows this way (this is just benchmarking an empty function, so the time here is not due to some heavy computation, just the slicing time). What is the right way to take full advantage of polars and do this processing efficiently?