0

I've some times series data which I would like to filter out a certain time range for each day. In my case I would like to filter out everything between 09:00 - 16:00 (i.e. I want all values from 09:00:00 to 16:00:00 inclusive).

I've tried and read as much documentations I can but since most of polars documentation is written for python am I having quite hard time to find a solution.

And the documentation for polars-rust is still missing a lot of examples and explaining texts.

This is the only question I can find on the subject but it does not work:
Filter by hour of datetime in rust polars It runs but no filtering is performed or returned.

I've tried:

let df = LazyCsvReader::new(path)
        .with_parse_dates(true)
        .has_header(true)
        .finish()?
        .collect()?;

let a = df
        .lazy()
        .filter(
            col("time")
                .dt()
                .hour()
                .gt_eq(9)
                .and(col("time").dt().hour().lt_eq(16)),
        )
        .collect();

and it almost works. I get everything from 9:00-16:55.

┌─────────────────────┐
│ time                │
│ ---                 │
│ datetime[μs]        │
╞═════════════════════╡
│ 2019-09-03 09:00:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2019-09-03 09:30:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2019-09-03 09:35:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2019-09-03 09:40:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ ...                 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2022-03-18 16:25:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2022-03-18 16:30:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2022-03-18 16:45:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2022-03-18 16:55:00 │
└─────────────────────┘

I've tried to add .and(col("time").dt().minute().eq(0) but that will affect ever hour, not just the "16-hour". I could use hour().le_eq(15) as a quick fix but I will still miss the last 5 min (15:55).

I cant seem to find any way to use if/else statement in these expressions.

Could anyone point me in the right direction?

nklsla
  • 315
  • 1
  • 9
  • What is your expected behavior? If a record was at 16:00:00, would you want it included or only records up to 15:59:59? If the 16:00:00 is not to be included, you could try lt(16) rather than lt_eq(16). – emagers Jan 05 '23 at 23:59
  • To match 16:00:00 or less, you may need to construct a `chrono::DateTime` or `chrono::NaiveDateTime` – PitaJ Jan 06 '23 at 00:02
  • @emagers The expected behavior is to get all values from 9:00-16:00 (inclusive). Using lt(16) will give me up to 15:55 (the time series is sampled at 5 min). – nklsla Jan 06 '23 at 00:11
  • @PitaJ Ok, so I've to convert the whole time-column to DateTime or NaiveDateTime before hand? Or can I do it in the polars expression? – nklsla Jan 06 '23 at 00:11
  • Eh I'm not sure. You could maybe try `col("time").dt().hour().lt(16).or(col("time").dt().hour().eq(16).and(col("time").dt().minute().eq(0)))` – PitaJ Jan 06 '23 at 04:17
  • @PitaJ That worked like a charm! Not very intuitive and quite cumbersome but it works! Please add this as an answer so I can mark the question (and give you credit)! Thanks a lot! – nklsla Jan 06 '23 at 09:39

1 Answers1

1

One way to solve this is to get everything before 16:00 OR at exactly 16:00:

     let a = df
        .lazy()
        .filter(
            col("time")
                .dt()
                .hour()
                .gt_eq(9)
                .and(
                    // less than 16 handles everything up to 15:59
                    col("time")
                        .dt()
                        .hour()
                        .lt(16)
                        .or(
                            // include also 16:00
                            col("time")
                                .dt()
                                .hour()
                                .eq(16)
                                .and(
                                    col("time").dt().minute().eq(0)
                                )
                        )
                ),
        )
        .collect();
PitaJ
  • 12,969
  • 6
  • 36
  • 55