0

I'm processing data with polars in rust. I need to filter out some values depend on previous line or item in other columns. I have read the documents but they seem to use internal methods to filter each single value, such as gt(), is_not_null() etc. Here are my needs:

  1. filter non-monotonically increasing value in a column
  2. filter a value depend on other values in the same row.
  3. modify some values in a row depend on some values in current row.

I think I have to apply a closure to generate a boolean value when filtering, but I can't find suitable functions in document. So I want to know how to apply a closure to filter? Are there other methods could meet my needs? Thank in advance.

Haoan
  • 71
  • 8
  • In `2.` what do you mean by `filter a value`. Filter the `Series`, `DataFrame`, ...? What does `3.` have to do with filtering? – cafce25 Dec 07 '22 at 04:40
  • @cafce25 In `2.`, I mean I want to delete a row with conditions related to more than one column. Take your sample dataframe below, if I want to delete a row where `df["a"][i] < df["c"][i]`, I don't know how to apply the function. In `3.`, is actually not a filter question. I mean I want to modify(replace/set) a value with condition related to other value (one or more). e.g, `if df["b"][i] == 'a' then set df["a"][i] = null`. – Haoan Dec 07 '22 at 07:06

1 Answers1

1

I'm only gonna answer '1.' since '2.' doesn't really make sense and '3.' is an unrelated question. You can filter monotonically by creating a mask from the series:

use polars::prelude::*;
use polars::df;

fn main() {
    let f = df!{
        "a" => [  1,   2,   2,   3,   4,   3],
        "b" => ["a", "a", "b", "a", "c", "d"],
        "c" => [  5,   4,   3,   2,   1,   0],
    }.unwrap();
    let mut previous = i32::MIN;
    let filter_monoton = f["a"].i32().unwrap().into_iter().map(|x| {
        let ret = if x.unwrap() > previous { true } else { false };
        previous = x.unwrap();
        ret
    }).collect();
    let mono = f.filter(&filter_monoton).unwrap();
    dbg!(&f);
    dbg!(&mono);
}

will output:

[src/main.rs:17] &f = shape: (6, 3)
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ c   │
│ --- ┆ --- ┆ --- │
│ i32 ┆ str ┆ i32 │
╞═════╪═════╪═════╡
│ 1   ┆ a   ┆ 5   │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ 2   ┆ a   ┆ 4   │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ 2   ┆ b   ┆ 3   │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ 3   ┆ a   ┆ 2   │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ 4   ┆ c   ┆ 1   │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ 3   ┆ d   ┆ 0   │
└─────┴─────┴─────┘
[src/main.rs:18] &mono = shape: (4, 3)
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ c   │
│ --- ┆ --- ┆ --- │
│ i32 ┆ str ┆ i32 │
╞═════╪═════╪═════╡
│ 1   ┆ a   ┆ 5   │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ 2   ┆ a   ┆ 4   │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ 3   ┆ a   ┆ 2   │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ 4   ┆ c   ┆ 1   │
└─────┴─────┴─────┘
cafce25
  • 15,907
  • 4
  • 25
  • 31