3

I'm migrating code to polars from pandas. I have time-series data consisting of a timestamp and value column and I need to compute a bunch of features. i.e.

df = pl.DataFrame({
    "timestamp": pl.date_range(
        datetime(2017, 1, 1),
        datetime(2018, 1, 1),
        timedelta(minutes=15),
        time_zone="Australia/Sydney",
        time_unit="ms", eager=True),
        })
    value = np.random.normal(0, 1, len(df))
    df = df.with_columns([pl.Series(value).alias("value")])

I need to generate a column containing an indicator if the timestamp is standard or daylight time. I'm currently using apply because as far as I can see the isn't a Temporal Expr, i.e. my current code is

def dst(timestamp:datetime):
    return int(timestamp.dst().total_seconds()!=0)

df = df.with_columns(pl.struct(["timestamp"]).apply(lambda x: dst(**x)).alias("dst"))

(this uses a trick that effectively checks if the tzinfo.dst(dt) offset is zero or not)

Is there a (fast) way of doing this using polars expressions rather than (slow) apply?

FObersteiner
  • 22,500
  • 8
  • 42
  • 72
David Waterworth
  • 2,214
  • 1
  • 21
  • 41
  • 2
    It looks like perhaps a feature request to expose `dtc_offset` https://docs.rs/chrono-tz/latest/src/chrono_tz/timezone_impl.rs.html#139 via the `.dt` namespace is what is needed. – jqurious Jun 29 '23 at 06:27
  • 1
    are you doing it using `apply` in pandas too, or do they have a vectorised way to do this? – ignoring_gravity Jun 29 '23 at 07:25
  • 2
    @ignoring_gravity I think pandas doesn't expose the `dst` property in the `dt` namespace either – FObersteiner Jun 29 '23 at 12:22
  • 2
    https://github.com/pola-rs/polars/pull/9629 – jqurious Jun 29 '23 at 21:56
  • Yeah @ignoring_gravity I didn't vectorise it in pandas either, but I've managed to vectorise all my other temporal features so this one is bugging me. – David Waterworth Jun 29 '23 at 22:20
  • 2
    0.18.5 was just released: https://pola-rs.github.io/polars/py-polars/html/reference/expressions/api/polars.Expr.dt.dst_offset.html – jqurious Jul 05 '23 at 14:52
  • 1
    Credits to @MarcoGorelli for implementation. Perhaps you can add it as an answer (or if somebody else wants to). – jqurious Jul 07 '23 at 04:45

2 Answers2

1

You can exploit strftime for this.

(
    df
        .with_columns(
            dst=pl.when(pl.col('timestamp').dt.strftime("%Z").str.contains("(DT$)"))
            .then(True)
            .otherwise(False)
            )
)

It relies on the local time zone ending in "DT" to determine the dst status. That works here and would work for US time zones (ie EST/EDT, CST/CDT, etc) but examples that wouldn't work are numerous.

Alternatively you could use the utc offset but it's a lot more convoluted.

(
    df
        .with_columns(
            tzoff=pl.col('timestamp').dt.strftime("%z").cast(pl.Int64())
            )
    .join(
        df
            .select(
                tzoff=pl.col('timestamp').dt.strftime("%z").cast(pl.Int64())
                )
            .unique('tzoff')
            .sort('tzoff')
            .with_columns(
                dst=pl.lit([False, True])
                ), 
        on='tzoff')
    .drop('tzoff')
)

This one assumes that the timezone only has 2 offsets and that the smaller of the two is standard time and the bigger one is daylight savings.

Dean MacGregor
  • 11,847
  • 9
  • 34
  • 72
1

With polars>=0.18.5 the following works

df = df.with_columns((pl.col("timestamp").dt.dst_offset()==0).cast(pl.Int32).alias("dst"))
David Waterworth
  • 2,214
  • 1
  • 21
  • 41
  • [base_utc_offset](https://pola-rs.github.io/polars/py-polars/html/reference/expressions/api/polars.Expr.dt.base_utc_offset.html#polars.Expr.dt.base_utc_offset) might also be useful in this context – FObersteiner Jul 07 '23 at 06:08
  • That would vary between timezones though, you'd have to know what the summer/winter offset was in advance I think - which of course works if you're dealing with a single timezone but using `dst_offset()==0` works in the general case – David Waterworth Jul 07 '23 at 06:11
  • 1
    I meant the wider context, not just your specific application. Sorry for the confusion ;-) – FObersteiner Jul 07 '23 at 06:15
  • Ahh right yeah I agree. – David Waterworth Jul 07 '23 at 07:50