3

I'm trying to use target-redshift to push data to aws-redshift

https://pypi.org/project/target-redshift/

I am using airflow to monitor etl status

This is error log and i have no clue what it means. Online documentation hardly exists for target-redshift. Is there any way to go around this error ?

[2021-03-07 15:04:11,841] {bash_operator.py:126} INFO - ERROR Exception writing records
[2021-03-07 15:04:11,841] {bash_operator.py:126} INFO - Traceback (most recent call last):
[2021-03-07 15:04:11,841] {bash_operator.py:126} INFO -   File "/usr/local/airflow/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_postgres/postgres.py", line 300, in write_batch
[2021-03-07 15:04:11,841] {bash_operator.py:126} INFO -     {'version': target_table_version})
[2021-03-07 15:04:11,841] {bash_operator.py:126} INFO -   File "/usr/local/airflow/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_postgres/sql_base.py", line 840, in write_batch_helper
[2021-03-07 15:04:11,841] {bash_operator.py:126} INFO -     metadata)
[2021-03-07 15:04:11,841] {bash_operator.py:126} INFO -   File "/usr/local/airflow/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_postgres/postgres.py", line 588, in write_table_batch
[2021-03-07 15:04:11,841] {bash_operator.py:126} INFO -     table=sql.Identifier(remote_schema['name'])
[2021-03-07 15:04:11,841] {bash_operator.py:126} INFO -   File "/usr/local/airflow/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_postgres/postgres.py", line 65, in execute
[2021-03-07 15:04:11,841] {bash_operator.py:126} INFO -     return super(_MillisLoggingCursor, self).execute(query, vars)
[2021-03-07 15:04:11,841] {bash_operator.py:126} INFO -   File "/usr/local/airflow/.virtualenvs/target-redshift/lib/python3.7/site-packages/psycopg2/extras.py", line 461, in execute
[2021-03-07 15:04:11,841] {bash_operator.py:126} INFO -     return super(LoggingCursor, self).execute(query, vars)
[2021-03-07 15:04:11,842] {bash_operator.py:126} INFO - psycopg2.errors.InternalError_: Invalid syntax: Create table sortkey auto with initial sortkeys.
[2021-03-07 15:04:11,842] {bash_operator.py:126} INFO - 
[2021-03-07 15:04:11,842] {bash_operator.py:126} INFO - CRITICAL ('Exception writing records', InternalError_('Invalid syntax: Create table sortkey auto with initial sortkeys.\n'))
rojer_1
  • 55
  • 4
  • I've just ran into the same error. My data pipeline was working fine until yesterday and now I got this issue. – heron J Mar 10 '21 at 14:44

1 Answers1

5

Underneath your task somewhere a query of this form is being run.

CREATE TABLE schema.tablename_copy 
(LIKE schema.tablename)

This is what is throwing the error Invalid syntax: Create table sortkey auto with initial sortkeys.

Fixing or removing the SORTKEY & DISTSTYLE of the origin table fixes the issue. eg.

ALTER TABLE schema.tablename ALTER SORTKEY NONE;
ALTER TABLE schema.tablename ALTER DISTSTYLE EVEN;

Depending on what your table actually needs (in terms of sort and distkeys) you might do something else. However more than likely this is some loading table and having no SORTKEY etc shouldn't be a big deal.

Other Approaches If the above fails any number of ways of avoiding the LIKE will also work. Since this LIKE is more than likely part of a load / upsert sequence. eg

  • Make a temp copy of destination table (using the LIKE statement)
  • Load new data into temp table
  • Remove from the destination table rows who's IDs are in the new data
  • Insert the new data into the destination table
  • Clean up temp table.

Airflow is probably doing this sequence of SQL queries behind the scenes for you. You could just dismantle the steps of the Airflow into separate tasks and avoid the use of LIKE.

  1. DROPPING the origin table (might be possible depending on use case) - I found on recreating the table the issue with LIKE went away.
  2. Explicitly create the duplicate table using a normal CREATE STATEMENT (instead of depending on a LIKE)
  3. Consider doing an INSERT / APPEND / COPY directly to the table and handling duplicates later

Extra info: This new error and it is on the Redshift side. Not entirely sure why but the LIKE statement is trying to transfer SORTKEYs and is unable to do so.

If you check the definition of the origin table schema.tablename you'll perhaps find something odd with Redshifts understanding of it.

The Amazon guys have provided this view to allow you to get full table DDL statements link

Running against the view:

SELECT * FROM 
WHERE schemaname = 'schema' AND tablename = 'tablename'

Will maybe show something useful. In my case the SORTKEY and DISTKEYs were both throwing errors: which to me indicated that Redshift had a flawed internal understanding of the keys. What is more - in my case I never set these keys, so I think they were implied by Redshift.