2

I need to store timezone info with my timestamp column in an Redshift external table. I am using the below commands:

Create external table:

CREATE EXTERNAL TABLE schema.test
(
    user_id BIGINT,
    created_by BIGINT,
    created_date TIMESTAMP
)
PARTITIONED BY (application_id varchar(100), aggreate_type varchar(200), created DATE)
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' 
STORED AS 
    INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' 
    OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' 
LOCATION 's3://<>/' 
table properties ('compression_type'='snappy', 'write.parallel'='off', 'write.maxfilesize.mb'='128');

UNLOAD data to S3

UNLOAD ('SELECT 
    user_id
    ,created_by
    ,(cast(created_date as date) || \'T\' || cast(created_date as time) || \'.000Z\') AS created_date
FROM intake.ods_user_resource_role
WHERE res_role_id = \'185814211\'')
TO 's3://<>/'
PARTITION BY (application_id, aggreate_type, created)
access_key_id '<>' secret_access_key '<>'
ENCRYPTED FORMAT AS PARQUET  ALLOWOVERWRITE  PARALLEL OFF  
;

Update table metadata to create partitions:

ALTER TABLE schema.test 
ADD PARTITION (application_id='abc', aggreate_type='xyz', created='2018-07-11') 
LOCATION 's3://<>/application_id=abc/aggreate_type=xyz/created=2018-07-11';

S3 data sample:

{
  "user_id": 12345,
  "created_by": 6789,
  "created_date": "2018-07-11T17:39:53.000Z"
}

Case 1: When using 'created_date TIMESTAMP' in DDL, I am getting below error:

Error: SQL Error [500310] [XX000]: Amazon Invalid operation: Spectrum Scan Error. File 'https://s3.
<file_location>000.parquet has an incompatible Parquet schema for column 's3://<>-

It is happening due to incompatibility between created_date column. Data in S3 is stored as "2018-07-11T17:39:53.000Z" but column is defined as TIMESTAMP (i.e. TIMESTAMP Withiout Timezone)

Case 2: When using 'created_date TIMESTAMP With Time zone' in DDL, I am getting below error:

Error: SQL Error [500310] [42804]: Amazon Invalid operation: Unsupported data type "pg_catalog.timestamptz" for external table creation;

Please let me know if there is a way to store Timestamp with Timezone("2018-07-11T17:39:53.000Z") in Redshift external table.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Saurabh Pant
  • 73
  • 2
  • 9

1 Answers1

0

According to the AWS documentation timestamp and date are the only acceptable datetime data types in external tables, and the format for timestamp values is different from the one you are creating:

yyyy-mm-dd HH:mm:ss.SSSSSS, for example: 2017-05-01 11:30:59.000000

That's why you're encountering errors.

You have to consider some other options for storing timestamps with timezone offsets, either converting the timestamp values or converting the data type, for example storing them as VARCHAR (CHARACTER VARYING) depending on how you use the data.

RedDree
  • 369
  • 7
  • 18