1

I was trying to create a new computed column based on existing column in polars rust DataFrame. There is a pyspark like with_column method available for that. But in the api documentation there is no example. Here is a example dataframe:

use polars::prelude::*;

fn example() {
    let df = df!["foo" => ["A", "A", "B", "B", "C"],
        "val1" => [1, 2, 2, 4, 2],
        "val2" => [1, 2, 2, 4, 2],
        ].unwrap();
    // newcolumn ration = val1/val2
    // df.with_column(...)
    println!("{}", df);

fn main{
    example()
}

I want to create a ration column which will calculate the ration between val1 and val 2 but there is no example available in the API documentation. Also there is another issue. The with column method might also need the col type to wrap the columns like pyspark but polars::prelute::* does not brings the col type into scope. Or may be some features needed to be enabled in the cargo file.

I am using latest version of Polars 0.22.8.

Does any one knows how to do it.

DataPsycho
  • 958
  • 1
  • 8
  • 28

2 Answers2

2

Your initial idea works with the lazy API:

# Cargo.toml
# ...
[dependencies]
polars = { version = "0.22.8", features = ["lazy"] }

// src/main.rs
use polars::prelude::*;

fn example() -> DataFrame {
    let df = df!["foo" => ["A", "A", "B", "B", "C"],
    "val1" => [1, 2, 2, 4, 2],
    "val2" => [1, 2, 2, 4, 2],
    ]
    .unwrap();

    df.lazy()
        .with_column((col("val1") / col("val2")).alias("ration"))
        .collect()
        .unwrap()
}

fn main() {
    let df = example();
    println!("{:?}", df);
}

Output:

shape: (5, 4)
┌─────┬──────┬──────┬────────┐
│ foo ┆ val1 ┆ val2 ┆ ration │
│ --- ┆ ---  ┆ ---  ┆ ---    │
│ str ┆ i32  ┆ i32  ┆ i32    │
╞═════╪══════╪══════╪════════╡
│ A   ┆ 1    ┆ 1    ┆ 1      │
├╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ A   ┆ 2    ┆ 2    ┆ 1      │
├╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ B   ┆ 2    ┆ 2    ┆ 1      │
├╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ B   ┆ 4    ┆ 4    ┆ 1      │
├╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ C   ┆ 2    ┆ 2    ┆ 1      │
└─────┴──────┴──────┴────────┘
Jandre Marais
  • 318
  • 2
  • 9
  • It seems like most of the advance features needed Lazy API. Not sure why the package wrier considered lazy as optional features. – DataPsycho Jul 27 '22 at 11:17
0

I have found a nasty way of doing that. I believe there must be a better way. After reading some part of the API doc I got the info: The argument of df.with_column() must have to be a type which implements the InToIter trait. Another part of the doc it says the type Series has a implementation of that Trait. So first I have created a new series and then provided the series into the function which will add a new column as follows:

use polars::prelude::*;

fn example() -> DataFrame {
    let mut df = df!["foo" => ["A", "A", "B", "B", "C"],
        "val1" => [1, 2, 2, 4, 2],
        "val2" => [1, 2, 2, 4, 2],
        ].unwrap();
    let ration = Series::new(
        "ration",
         df.column("val1").unwrap()/df.column("val2").unwrap()
    );
    let _ = df.with_column(ration).unwrap();
    df
}

fn main() {
    let df = example();
    println!("{}", df);
}

Result:

shape: (5, 4)
┌─────┬──────┬──────┬────────┐
│ foo ┆ val1 ┆ val2 ┆ ration │
│ --- ┆ ---  ┆ ---  ┆ ---    │
│ str ┆ i32  ┆ i32  ┆ i32    │
╞═════╪══════╪══════╪════════╡
│ A   ┆ 1    ┆ 1    ┆ 1      │
├╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ A   ┆ 2    ┆ 2    ┆ 1      │
├╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ B   ┆ 2    ┆ 2    ┆ 1      │
├╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ B   ┆ 4    ┆ 4    ┆ 1      │
├╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ C   ┆ 2    ┆ 2    ┆ 1      │
└─────┴──────┴──────┴────────┘
DataPsycho
  • 958
  • 1
  • 8
  • 28