2

here's our test data to work with:

import polars as pl
import pandas as pd
from datetime import date, time, datetime

df = pl.DataFrame(
    pl.date_range(
        low=date(2022, 1, 3),
        high=date(2022, 9, 30),
        interval="5m",
        time_unit="ns",
        time_zone="UTC",
    ).alias("UTC")
)

I specifically need replace_time_zone to actually change the underlying timestamp but the same timezone works with convert_time_zone, and faild with replace_time_zone.

df.select(
    pl.col("UTC").dt.convert_time_zone(time_zone="America/New_York").alias("US")
)

# output
shape: (77761, 1)
┌────────────────────────────────┐
│ US                             │
│ ---                            │
│ datetime[ns, America/New_York] │
╞════════════════════════════════╡
│ 2022-01-02 19:00:00 EST        │
│ 2022-01-02 19:05:00 EST        │
│ 2022-01-02 19:10:00 EST        │
│ 2022-01-02 19:15:00 EST        │
│ …                              │
│ 2022-09-29 19:45:00 EDT        │
│ 2022-09-29 19:50:00 EDT        │
│ 2022-09-29 19:55:00 EDT        │
│ 2022-09-29 20:00:00 EDT        │

df.select(
   pl.col("UTC").dt.replace_time_zone(time_zone="America/New_York").alias("US")
)

  # error output
  thread '<unnamed>' panicked at 'No such local time', /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/chrono-0.4.23/src/offset/mod.rs:186:34
---------------------------------------------------------------------------
PanicException                            Traceback (most recent call last)
Cell In[78], line 1
----> 1 df.select(
      2     pl.col("UTC").dt.replace_time_zone(time_zone="America/New_York").alias("US")
      3     )

File ~/Live-usb-storage/projects/python/alpha/lib/python3.10/site-packages/polars/dataframe/frame.py:6432, in DataFrame.select(self, exprs, *more_exprs, **named_exprs)
   6324 def select(
   6325     self,
   6326     exprs: IntoExpr | Iterable[IntoExpr] | None = None,
   6327     *more_exprs: IntoExpr,
   6328     **named_exprs: IntoExpr,
   6329 ) -> Self:
   6330     """
   6331     Select columns from this DataFrame.
   6332 
   (...)
   6429 
   6430     """
   6431     return self._from_pydf(
-> 6432         self.lazy()
   6433         .select(exprs, *more_exprs, **named_exprs)
   6434         .collect(no_optimization=True)
   6435         ._df
   6436     )

File ~/Live-usb-storage/projects/python/alpha/lib/python3.10/site-packages/polars/lazyframe/frame.py:1443, in LazyFrame.collect(self, type_coercion, predicate_pushdown, projection_pushdown, simplify_expression, no_optimization, slice_pushdown, common_subplan_elimination, streaming)
   1432     common_subplan_elimination = False
   1434 ldf = self._ldf.optimization_toggle(
   1435     type_coercion,
   1436     predicate_pushdown,
   (...)
   1441     streaming,
   1442 )
-> 1443 return pli.wrap_df(ldf.collect())

PanicException: No such local time
FObersteiner
  • 22,500
  • 8
  • 42
  • 72
stucash
  • 1,078
  • 1
  • 12
  • 23

2 Answers2

2

You cannot replace the timezone in a UTC time series with a timezone that has DST transitions - you'll end up with non-existing and/or missing datetimes. The error could be a bit more informative, but I do not think this is specific to polars.

Here's an illustration. "America/New_York" had a DST transition on Mar 13. 2 am did not exist on that day... so this works fine:

import polars as pl
from datetime import date

df = pl.DataFrame(
    pl.date_range(
        low=date(2022, 3, 11),
        high=date(2022, 3, 13),
        interval="5m",
        time_unit="ns",
        time_zone="UTC",
    ).alias("UTC")
)

print(
    df.select(
       pl.col("UTC").dt.replace_time_zone(time_zone="America/New_York").alias("US")
    )
)
# shape: (289, 1)
# ┌────────────────────────────────┐
# │ US                             │
# │ ---                            │
# │ datetime[ns, America/New_York] │
# ╞════════════════════════════════╡
# │ 2022-03-11 00:00:00 EST        │
# │ 2022-03-11 00:05:00 EST        │
# │ 2022-03-11 00:10:00 EST        │
# │ 2022-03-11 00:15:00 EST        │
# │ …                              │

while this doesn't:

df = pl.DataFrame(
    pl.date_range(
        low=date(2022, 3, 13),
        high=date(2022, 3, 15),
        interval="5m",
        time_unit="ns",
        time_zone="UTC",
    ).alias("UTC")
)

print(
    df.select(
       pl.col("UTC").dt.replace_time_zone(time_zone="America/New_York").alias("US")
    )
)
# PanicException: No such local time

Workaround you could use is to convert UTC to the desired timezone, then add its UTC offset. Ex:

df = pl.DataFrame(
    pl.date_range(
        low=date(2022, 1, 3),
        high=date(2022, 9, 30),
        interval="5m",
        time_unit="ns",
        time_zone="UTC",
    ).alias("UTC")
)

df = df.with_columns(
       pl.col("UTC").dt.convert_time_zone(time_zone="America/New_York").alias("US")
)

df = df.with_columns(
    (pl.col("US")+(pl.col("UTC")-pl.col("US").dt.replace_time_zone(time_zone="UTC")))
    .alias("US_fakeUTC")
    )

print(df.select(pl.col("US_fakeUTC")))
# shape: (77761, 1)
# ┌────────────────────────────────┐
# │ US_fakeUTC                     │
# │ ---                            │
# │ datetime[ns, America/New_York] │
# ╞════════════════════════════════╡
# │ 2022-01-03 00:00:00 EST        │
# │ 2022-01-03 00:05:00 EST        │
# │ 2022-01-03 00:10:00 EST        │
# │ 2022-01-03 00:15:00 EST        │
# │ …                              │
FObersteiner
  • 22,500
  • 8
  • 42
  • 72
  • 1
    Thanks for the clarification; my use case doesn't stop at replacing/converting the time zone though; basically the data was in UTC and I'll need to convert them to EST/EDT and do a filtering on time range (9AM-16PM); `convert_time_zone` does work with filter. I'll raise a question separately to discuss that. – stucash Mar 21 '23 at 07:28
  • I've added another [question](https://stackoverflow.com/questions/75801389/polars-replace-time-zone-and-convert-time-zone-show-different-string-representat) to discuss `replace_time_zone` and it want's `convert_time_zone` that it wasn't working my bad (I can actually work around using `convert_time_zone`'s string representation). – stucash Mar 21 '23 at 12:50
1

You need to pass the time zone to date_range directly:

In [4]: import polars as pl
   ...: import pandas as pd
   ...: from datetime import date, time, datetime
   ...:
   ...: df = pl.DataFrame(
   ...:     pl.date_range(
   ...:         low=date(2022, 1, 3),
   ...:         high=date(2022, 9, 30),
   ...:         interval="5m",
   ...:         time_unit="ns",
   ...:         time_zone="America/New_York",
   ...:     ).alias("America/New_York")
   ...: )

In [5]: df
Out[5]:
shape: (77749, 1)
┌────────────────────────────────┐
│ America/New_York               │
│ ---                            │
│ datetime[ns, America/New_York] │
╞════════════════════════════════╡
│ 2022-01-03 00:00:00 EST        │
│ 2022-01-03 00:05:00 EST        │
│ 2022-01-03 00:10:00 EST        │
│ 2022-01-03 00:15:00 EST        │
│ …                              │
│ 2022-09-29 23:45:00 EDT        │
│ 2022-09-29 23:50:00 EDT        │
│ 2022-09-29 23:55:00 EDT        │
│ 2022-09-30 00:00:00 EDT        │
└────────────────────────────────┘

Then, it'll work, because polars can just start at the start time and keep adding 5 minutes, which is always well-defined.

If you try to first make a UTC date range and then replace the time zone, then you will have ended up with ambiguous or non-existent datetimes (due to DST).

ignoring_gravity
  • 6,677
  • 4
  • 32
  • 65
  • Thanks for the explanation, I'll discuss my use case in a different question with more details. But in short, `convert_time_zone` is not working at the moment. – stucash Mar 21 '23 at 07:29
  • sure, feel free to tag me in the question in case I miss it – ignoring_gravity Mar 21 '23 at 09:22
  • I've added another [question](https://stackoverflow.com/questions/75801389/polars-replace-time-zone-and-convert-time-zone-show-different-string-representat) to discuss `replace_time_zone`; I was wrong in saying `convert_time_zone` not working; if at all, it is irrelevant. It seems to me that it's `replace_time_zone` that needs fixing. – stucash Mar 21 '23 at 12:53