2

I am trying to write a Python script to access Amazon Redshift to create a table in Redshift and copy data from S3 to the Redshift table.

My code is:

import psycopg2
import os
#import pandas as pd
import requests
requests.packages.urllib3.disable_warnings()

redshift_endpoint = os.getenv("END-point")
redshift_user = os.getenv("user")
redshift_pass = os.getenv("PASSWORD")
port = 5439
dbname = 'DBNAME'
conn = psycopg2.connect(
    host="", 
    user='', 
    port=5439, 
    password='', 
    dbname='')
cur = conn.cursor()
aws_key = os.getenv("access_key") # needed to access S3 Sample Data
aws_secret = os.getenv("secret_key")
#aws_iam_role= os.getenv('iam_role') #tried using this too

base_copy_string= """copy %s from 's3://mypath/%s'.csv 
credentials 'aws_access_key_id= %s aws_access_secrect_key= %s'
delimiter '%s';""" # the base COPY string that we'll be using

#easily generate each table that we'll need to COPY data from
tables = ["employee"]
data_files = ["test"]
delimiters = [","]
#the generated COPY statements we'll be using to load data;
copy_statements = []
for tab, f, delim in zip(tables, data_files, delimiters):
    copy_statements.append(base_copy_string % (tab, f, aws_key, aws_secret, delim)%)
#create Table
cur.execute(""" create table employee(empname varchar(30),empno integer,phoneno integer,email varchar(30))""")
for copy_statement in copy_statements: # execute each COPY statement
    cur.execute(copy_statement)
conn.commit()
for table in tables + ["employee"]:
    cur.execute("select count(*) from %s;" % (table,))    
    print(cur.fetchone())
conn.commit() # make sure data went through and commit our statements permanently.

When I run this command I getting an Error at cur.execute(copy_statement)

**Error:**   error:  Invalid credentials. Must be of the format: credentials 'aws_iam_role=...' or 'aws_access_key_id=...;aws_secre
t_access_key=...[;token=...]'
  code:      8001
  context:
  query:     582
  location:  aws_credentials_parser.cpp:114
  process:   padbmaster [pid=18692]

Is there a problem in my code? Or is it is an AWS access_key problem?

I even tried using an iam_role but I get an error:

IAM role cannot assume role even in Redshift

I have a managed IAM role permission by attaching S3FullAccess policy.

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470

3 Answers3

2

There are some errors in your script.

1) Change base_copy_string as below:

base_copy_string= """copy %s from 's3://mypath/%s.csv' credentials 'aws_access_key_id=%s;aws_secret_access_key=%s' delimiter '%s';""" # the base COPY string that we'll be using

There must be a ; added in credentials and also other formatting issues with single-quotes. It is aws_secret_access_key and not aws_access_secrect_key.

check this link for detailed info: http://docs.aws.amazon.com/redshift/latest/dg/copy-usage_notes-access-permissions.html#copy-usage_notes-iam-permissions

I suggest you use iam-roles instead of credentials. http://docs.aws.amazon.com/redshift/latest/dg/loading-data-access-permissions.html

2) change copy_statements.append as below(remove extra % in the end):

copy_statements.append(base_copy_string % (tab, f, aws_key, aws_secret, delim))

Correct these and try again.

Madhukar Mohanraju
  • 2,793
  • 11
  • 28
  • thank you, and now I getting this error copy_statements.append(base_copy_string % (tab,f,aws_key,aws_secret,delim)) TypeError: not all arguments converted during string formatting – Bandaru Venkat Oct 07 '17 at 20:58
  • use `str(variable_name)` to typecase the variables to string. Probably aws_key and aws_secret is causing the error. `copy_statements.append(base_copy_string %(tab,f,str(aws_key),str(aws_secret),delim))` – Madhukar Mohanraju Oct 08 '17 at 06:49
1

To start with, NEVER, NEVER, NEVER hardcode access keys and secret keys in your code. So that rules out your first query. Now coming to right way of implementing things. You are right, IAM Role is the right way of doing it. Unfortunately, I can't get the exact error and use case from your description. As far as I understand, you are trying to run this python file from your computer(local machine). Hence, you need to attach permission with your IAM user to have access to RedShift(and all other services your code is touching). Please correct me if my assumption is wrong.

Dishant Kapadiya
  • 523
  • 4
  • 10
  • yes, that what I am trying to do. i am trying to send data from S3 to Redshift – Bandaru Venkat Oct 07 '17 at 18:59
  • The right way to say is pulling data from S3. So you need to give RedShift resources permission to access S3(which you are doing). It will be helpful if you can attach screenshot of your error in case of IAM roles. – Dishant Kapadiya Oct 07 '17 at 19:24
  • ----------------------------------------------- error: User arn:aws:redshift:us-east-1:028810420564:dbuser:my-cluster/venkat is not authorized to assume IAM Role arn:aws:iam::028810420 564:role/redshift-s3 code: 8001 context: IAM Role = arn:aws:iam::028810420564:role/redshift-s3 query: 3209 location: xen_aws_credentials_mgr.cpp:229 process: padbmaster [pid=19102] ----------------------------------------------- – Bandaru Venkat Oct 07 '17 at 19:46
  • Okay so I found that your IAM ARN has a space character in it. arn:aws:iam::028810420 564:role/redshift-s3. This is invalid format and maybe there is an error in copying ARN of IAM. Besides that you want to take care that you are specifying whole ARN every time. – Dishant Kapadiya Oct 07 '17 at 19:57
-1

Just in case if you missed Install AWS CLI Run aws configure Put your credentials and region Hope this helps.

  • It's really hard to tell what to do from what you wrote. Please consider editing. – norok2 Oct 07 '17 at 16:49
  • This is irrelevant since he is not using the AWS CLI or a library such as boto or boto3. He's passing his credentials in to the copy statement itself. – Zach King Oct 07 '17 at 17:14