I need to store timezone info with my timestamp column in an Redshift external table. I am using the below commands:
Create external table:
CREATE EXTERNAL TABLE schema.test
(
user_id BIGINT,
created_by BIGINT,
created_date TIMESTAMP
)
PARTITIONED BY (application_id varchar(100), aggreate_type varchar(200), created DATE)
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS
INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION 's3://<>/'
table properties ('compression_type'='snappy', 'write.parallel'='off', 'write.maxfilesize.mb'='128');
UNLOAD data to S3
UNLOAD ('SELECT
user_id
,created_by
,(cast(created_date as date) || \'T\' || cast(created_date as time) || \'.000Z\') AS created_date
FROM intake.ods_user_resource_role
WHERE res_role_id = \'185814211\'')
TO 's3://<>/'
PARTITION BY (application_id, aggreate_type, created)
access_key_id '<>' secret_access_key '<>'
ENCRYPTED FORMAT AS PARQUET ALLOWOVERWRITE PARALLEL OFF
;
Update table metadata to create partitions:
ALTER TABLE schema.test
ADD PARTITION (application_id='abc', aggreate_type='xyz', created='2018-07-11')
LOCATION 's3://<>/application_id=abc/aggreate_type=xyz/created=2018-07-11';
S3 data sample:
{
"user_id": 12345,
"created_by": 6789,
"created_date": "2018-07-11T17:39:53.000Z"
}
Case 1: When using 'created_date TIMESTAMP' in DDL, I am getting below error:
Error: SQL Error [500310] [XX000]: Amazon Invalid operation: Spectrum Scan Error. File 'https://s3.
<file_location>000.parquet has an incompatible Parquet schema for column 's3://<>-
It is happening due to incompatibility between created_date column. Data in S3 is stored as "2018-07-11T17:39:53.000Z" but column is defined as TIMESTAMP (i.e. TIMESTAMP Withiout Timezone)
Case 2: When using 'created_date TIMESTAMP With Time zone' in DDL, I am getting below error:
Error: SQL Error [500310] [42804]: Amazon Invalid operation: Unsupported data type "pg_catalog.timestamptz" for external table creation;
Please let me know if there is a way to store Timestamp with Timezone("2018-07-11T17:39:53.000Z") in Redshift external table.