I have an external table in Redshift spectrum with DDL having datetime column as somewhat below:
collector_tstamp TIMESTAMP WITHOUT TIME ZONE
Objective: I am trying to parquet a certain set of data and then add the partition into Spectrum to see if values are perfectly parsed.
After creating the parquet, following is the metadata of 'collector_tstamp' from corresponding parquet file:
{"metadata": null, "field_name": "collector_tstamp", "name": "collector_tstamp", "numpy_type": "datetime64[ns]", "pandas_type": "datetime"}
Before this, I am converting the pandas dataframe column into datetime data type with the help of following:
df['collector_tstamp'] = pd.to_datetime(df['collector_tstamp'])
Issue: When I am querying the data in Redshift spectrum, I am seeing the below values so clearly the parsing is wrong but I am not sure what data type should I change to so that it parses correctly:
collector_tstamp
36332-04-23 15:29:01
36332-04-23 15:29:01
36332-04-23 15:29:01
36332-04-23 15:29:01
36332-04-23 15:29:01
I also tried something like this but still I am getting the above value:
df['collector_tstamp'] = pd.to_datetime(df['collector_tstamp'], infer_datetime_format=True)
I also tried this but still the same value when I query it:
df['collector_tstamp'] = df['collector_tstamp'].astype('datetime64[s]')
Sample data:
collector_tstamp
2019-01-04 04:02:36
2019-01-04 04:03:41
2019-01-04 04:03:45
2019-01-04 04:04:11