0

Is there a simple way to apply a function to a Polars DataFrame in Rust?

If my function and dataframe is the following for example:

fn double(x:i32) -> i32 {
    x*2
}

let s0 = Series::new("id", &[1, 2, 3]);
let s1 = Series::new("cost", &[10, 20, 30]);
let mut df = DataFrame::new(vec![s0, s1])?;

Here I'd like to do something that looks like:

df.apply("cost", |x| double(x))

Using pandas, I achieve the same with:

df["cost"] = df["cost"].apply(lambda x: double(x))

I'd love to know the equivalent way to apply a function over a column like this!

Herohtar
  • 5,347
  • 4
  • 31
  • 41
Sam
  • 773
  • 4
  • 13
  • 1
    Something like `df.column("cost")?.i32()?.apply(double)` – PitaJ Aug 02 '22 at 16:41
  • Thank you this got me there in the end. I used this line above with replace: ```let df = df.replace("cost", df.column("cost")?.i32()?.apply(double));``` – Sam Aug 03 '22 at 08:08
  • 1
    Found what I believe to be a more elegant solution, closing this now. ```df.apply("cost", |x| x.i32().unwrap().apply(double))?;``` – Sam Aug 03 '22 at 08:32

1 Answers1

0

This might work ...

use polars::prelude::*;

fn main() {
    let s0 = Series::new("id", &[1, 2, 3]);
    let s1 = Series::new("cost", &[10, 20, 30]);
    let df = DataFrame::new(vec![s0, s1]).unwrap().lazy();

    let o = GetOutput::from_type(DataType::UInt32);

    let new_df = df.with_column(
        col("cost")
            .alias("new_cost")
            .map(|x| Ok((&x * 2i32).into_series()), o),
    );

    println!("{:?}", new_df.clone().collect());
}

Robert
  • 131
  • 1
  • 7
  • Would this incur extra memory overhead if I needed to apply multiple operations due to cloning? – Sam Aug 11 '22 at 09:16
  • @Sam My understanding with LazyFrames is, that the system will optimize all your operations when you finally need the data. In your case if you apply multiple operations (e.g. “df.with_column( ... “) then the system (polars) will minimize the memory footprint needed. E.g., if you first blow up your df (add ~30 columns) and at the end you only need the last column polars will do its best to just give you the results and not actually inflate / deflate your df while doing so. – Robert Aug 12 '22 at 19:19