2

I'm trying to use polars to apply a function from another library across each row of an input. I can't find any examples or tests of using an Expr to apply a function, even when it has one return value; so I'm lost.

It's taking an input dataframe with two float columns, and trying to append three columns as generated by a function with this form:

fn f(a: f64, b: f64) -> (f64, f64, f64);

Is there a simple way to do this?

kmdreko
  • 42,554
  • 6
  • 57
  • 106

2 Answers2

2

There are different strategies here. You can assign the returning values to different columns. Or you can assign the returning values to a single column of type List<Float64>. I will show them both.

Different columns

Assigning them to different columns does not really fit the lazy API well, so we do that in eager.

/// Your function that takes 2 argument and returns 3
fn black_box(_a: f64, _b: f64) -> (f64, f64, f64) {
    (1.0, 2.0, 3.0)
}

fn to_different_columns() -> Result<()> {
    let df = df![
        "a" => [1.0, 2.0, 3.0],
        "b" => [1.0, 2.0, 3.0]
    ]?;

    let mut out_1 = vec![];
    let mut out_2 = vec![];
    let mut out_3 = vec![];

    df.column("a")?
        .f64()?
        .into_no_null_iter()
        .zip(df.column("b")?.f64()?.into_no_null_iter())
        .for_each(|(a, b)| {
            let (out_val1, out_val2, out_val3) = black_box(a, b);
            out_1.push(out_val1);
            out_2.push(out_val2);
            out_3.push(out_val3);
        });

    let out1 = Series::from_vec("out1", out_1);
    let out2 = Series::from_vec("out2", out_2);
    let out3 = Series::from_vec("out3", out_3);
    let df = DataFrame::new(vec![out1, out2, out3]);

    Ok(())
}

List column

If we decide to return a single Series we best can use polars lazy


fn to_list() -> Result<()> {
    let df = df![
        "a" => [1.0, 2.0, 3.0],
        "b" => [1.0, 2.0, 3.0]
    ]?;

    let df = df
        .lazy()
        .select([map_multiple(
            |columns| {
                Ok(columns[0]
                    .f64()?
                    .into_no_null_iter()
                    .zip(columns[1].f64()?.into_no_null_iter())
                    .map(|(a, b)| {
                        let out = black_box(a, b);
                        Series::new("", [out.0, out.1, out.2])
                    })
                    .collect::<ListChunked>()
                    .into_series())
            },
            [col("a"), col("b")],
            GetOutput::from_type(DataType::List(Box::new(DataType::Float64))),
        )])
        .collect()?;

    dbg!(df);

    Ok(())
}
ritchie46
  • 10,405
  • 1
  • 24
  • 43
0

List column (using ChunkedArray):

fn to_list() -> Result<(), Box<dyn Error>> {
    let df = df![
        "a" => [1.0, 2.0, 3.0],
        "b" => [1.0, 2.0, 3.0]
    ]?;

    let df = df
        .lazy()
        .select([map_multiple(
            |columns| {
                Ok(Some(
                         columns[0].f64()?.into_no_null_iter()
                    .zip(columns[1].f64()?.into_no_null_iter())
                    .map(|(a, b)| {
                        let out = black_box(a, b);
                        Series::new("", [out.0, out.1, out.2])
                    })
                    .collect::<ChunkedArray<ListType>>()
                    .into_series()))
            },
            [col("a"), col("b")],
            GetOutput::from_type(DataType::Float64),
        ).alias("new column")
        ])
        .collect()?;

    dbg!(df);

    Ok(())
}

/// Your function that takes 2 argument and returns 3
fn black_box(a: f64, b: f64) -> (f64, f64, f64) {
    (a+b, 5.4 * a - 2.1 * b, a*b)
}

to_list()?;

The output:

df = shape: (3, 1)
┌─────────────────┐
│ new column      │
│ ---             │
│ list[f64]       │
╞═════════════════╡
│ [2.0, 3.3, 1.0] │
│ [4.0, 6.6, 4.0] │
│ [6.0, 9.9, 9.0] │
└─────────────────┘
Claudio Fsr
  • 106
  • 6