2

I'm trying to copy data from S3 bucket to Redshift Database using airflow, here is my code:

from airflow.hooks import PostgresHook
path = 's3://my_bucket/my_file.csv'

redshift_hook = PostgresHook(postgres_conn_id='table_name')
access_key='abcd' 
secret_key='aaaa'
query= """
copy my_table 
FROM '%s' 
ACCESS_KEY_ID '%s' 
SECRET_ACCESS_KEY '%s' 
REGION 'eu-west-1' 
ACCEPTINVCHARS 
IGNOREHEADER 1 
FILLRECORD 
CSV
BLANKSASNULL 
EMPTYASNULL 
MAXERROR 100 
DATEFORMAT 'MM/DD/YYYY'
""" % ( path,
        access_key,
        secret_key) 

redshift_hook.run(query)

But when I run this script, it raises the following error:

    cursor.execute(statement, parameters)
sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) no such table: connection [SQL: 'SELECT connection.password AS connection_password, connection.extra AS connection_extra, connection.id AS connection_id, connection.conn_id AS connection_conn_id, connection.conn_type AS connection_conn_type, connection.host AS connection_host, connection.schema AS connection_schema, connection.login AS connection_login, connection.port AS connection_port, connection.is_encrypted AS connection_is_encrypted, connection.is_extra_encrypted AS connection_is_extra_encrypted \nFROM connection \nWHERE connection.conn_id = ?'] [parameters: ('elevaate_uk_production',)]

Can I get some help with this please ? Thank you in advance.

kab
  • 195
  • 2
  • 5
  • 12

1 Answers1

2

Your connection_id is the same as a table name? You need to go to your airflow ui at http://………/admin/connections/ and ADD a postgres connection id for your redshift cluster. NOW put the name of that connection id where you wrote table_name.

While you're at it define an s3 connection and put the access and secret key in there. Load it by instantiating an SSHHook by the connection id name, then get the keys from it.

Lastly replace your …run(query) with a PostgresOperator. Put the keys in a parameters dict, then in your SQL string use:

from airflow.operators import PostgresOperator
form airflow.hooks import S3Hook

s3 = S3hook(aws_conn_id="s3_conn_id_in_airflow_ui_man") 
redshift_load_task = PostgresOperator("""
copy my_table 
FROM '{{ params.source }}' 
ACCESS_KEY_ID '{{ params.access_key}}' 
SECRET_ACCESS_KEY '{{ params.secret_key }}' 
REGION 'eu-west-1' 
ACCEPTINVCHARS 
IGNOREHEADER 1 
FILLRECORD 
CSV
BLANKSASNULL 
EMPTYASNULL 
MAXERROR 100 
DATEFORMAT 'MM/DD/YYYY'
""",
postgres_conn_id="redshift_conn_id_in_airflow_ui_man",
database="uh_you_tell_me",
params={
    'source': 's3://my_bucket/my_file.csv',
    'access_key': s3.get_credentials().access_key,
    'secret_key': s3.get_credentials().secret_key,
},
)
dlamblin
  • 43,965
  • 20
  • 101
  • 140