0

I have some timeseries data in the form of a pl.DataFrame object with a datetime col and a data col. I would like to correct an error in the data that occurs during a distinct time range by overwriting it with a value.

Now in pandas, one would use the datetimes as index and slice that timerange and assign to it, like so

df.loc[start_dt_string:end_dt_string, column_name] = some_val

Being completely new to polars I have a hard time figuring out how to express this. I tried selecting rows with .filter and .is_between but of course this doesn't support assignment. How would one go about doing this with polars?

sobek
  • 1,386
  • 10
  • 28

1 Answers1

1

Apparently I missed this in the docs, so RTFM to the rescue. In the corresponding section of the Coming from Pandas guide, this case is covered almost verbatim:

df.with_column(
    pl.when(pl.col("c") == 2)
    .then(pl.col("b"))
    .otherwise(pl.col("a")).alias("a")
)

The above pandas example uses timerange slicing, so for the sake of completeness I'm going to add polars code that does exactly the same:


df.with_column(
    pl.when(pl.col(dt_column_name).is_between(
        datetime(start_dt_string),
        datetime(end_dt_string),
        include_bounds=True
    ).then(pl.lit(some_val))
    .otherwise(pl.col(column_name))
    .alias(column_name)
)

sobek
  • 1,386
  • 10
  • 28