I'm using a S3 bucket where the data is organized into files by an ID & year/month - meaning one file per ID & month.
In each (csv.gz) file each record has a timestamp in the format: YYYY-MM-dd HH:mm:ss
(note the missing T
).
Now, when querying the data I want to support datetime granularity down to seconds so naturally it's desired to filter the data in S3 already prior to managing the data in Python.
I can't however find any method to do this.
The function TO_TIMESTAMP
doesn't support a user provided format (expects a T
date/time separator) and combining SUBSTRING and CAST (CAST(SUBSTRING(my_timestamp_column, 1, 10) AS TIMESTAMP)
) yields a The query cannot be evaluated
error.
Is there any way around this?
The documentation states that the function TO_TIMESTAMP
is "the inverse operation of TO_STRING" which is not quite true as the latter supports a time_format_pattern
.