However this is only outputing time >= '2023-07-05 20:00:00'
. What was wrong?
I think it is not 2023-07-05 20:00:00
but 2023-07-05 20:00:00
, note the following - 'America/New_York' is 4 hours behind UTC, so:
SELECT date_format(timestamp '2023-07-05 00:00' AT TIME ZONE 'America/New_York', '%Y-%m-%d %H:%i:%s');
Will result in ('UTC' is the default timezone in Athena AFAIK):
_col0
---------------------
2023-07-04 20:00:00
If your data is originally in some timezone then you need to parse it including this information:
-- sample data
with dataset(year, month, day, hour) as (
values (2023, 7, 4, 23),
(2023, 7, 5, 1),
(2023, 7, 5, 2),
(2023, 7, 5, 3),
(2023, 7, 5, 4)
)
-- query
select PARSE_DATETIME(dt, 'yyyy-M-d H') at time zone 'America/New_York' without_tz,
PARSE_DATETIME(dt || ' America/New_York', 'yyyy-M-d H ZZZ') at time zone 'America/New_York' with_tz
from (
SELECT CONCAT(
CAST(year as VARCHAR),
'-',
CAST(month as varchar),
'-',
cast(day as varchar),
' ',
cast(hour as varchar)) dt
FROM dataset);
Which results in:
without_tz |
with_tz |
2023-07-04 19:00:00.000 America/New_York |
2023-07-04 23:00:00.000 America/New_York |
2023-07-04 21:00:00.000 America/New_York |
2023-07-05 01:00:00.000 America/New_York |
2023-07-04 22:00:00.000 America/New_York |
2023-07-05 02:00:00.000 America/New_York |
2023-07-04 23:00:00.000 America/New_York |
2023-07-05 03:00:00.000 America/New_York |
2023-07-05 00:00:00.000 America/New_York |
2023-07-05 04:00:00.000 America/New_York |
So obviously your filter will fail to produce correct results on the without_tz
column (also note that cast(... as timespan)
seems to ignore timezone info, while literal - timezone ...
will process it).
P.S.
AT TIME ZONE 'America/Los_Angeles' AT TIME ZONE 'America/New_York'
is kind of pointless - it will convert to one timezone and then to another, so basically it should be the same as converting to the final timezone.