17

This is my python code and I want to connect my Amazon Redshift database to Python, but it is showing error in host.

Can anyone tell me the correct syntax? Am I passing all the parameters correctly?

con=psycopg2.connect("dbname = pg_table_def, host=redshifttest-icp.cooqucvshoum.us-west-2.redshift.amazonaws.com, port= 5439, user=me, password= secret")

This is the error:

OperationalError: could not translate host name "redshift://redshifttest-xyz.cooqucvshoum.us-west-2.redshift.amazonaws.com," to address: Unknown host

GG.
  • 21,083
  • 14
  • 84
  • 130
vihaa_vrutti
  • 271
  • 1
  • 4
  • 13

5 Answers5

43

It appears that you wish to run Amazon Redshift queries from Python code.

The parameters you would want to use are:

  • dbname: This is the name of the database you entered in the Database name field when the cluster was created.
  • user: This is you entered in the Master user name field when the cluster was created.
  • password: This is you entered in the Master user password field when the cluster was created.
  • host: This is the Endpoint provided in the Redshift management console (without the port at the end): redshifttest-xyz.cooqucvshoum.us-west-2.redshift.amazonaws.com
  • port: 5439

For example:

con=psycopg2.connect("dbname=sales host=redshifttest-xyz.cooqucvshoum.us-west-2.redshift.amazonaws.com port=5439 user=master password=secret")
John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
  • Hello sir, i did same, but it is giving me operational error in host name, i have mentioned the error please look at it and help me – vihaa_vrutti Jul 21 '17 at 04:40
  • Your host name is different to my host name. It is just the DNS name, not the extra stuff. Your dbname is also different. It should just be the name you entered when the cluster was created. – John Rotenstein Jul 21 '17 at 04:42
  • Yes , i know but my host name is correct and i am sure about it, though its giving me error – vihaa_vrutti Jul 21 '17 at 04:44
  • 1
    The host should be set to a name that can be resolved by DNS. A hostname starting with `redshift://` is invalid. Your dbname is also invalid. – John Rotenstein Jul 21 '17 at 04:56
  • Yes i removed that and corrected both and then run the code but still error is same – vihaa_vrutti Jul 21 '17 at 05:01
  • Then you should Edit your question because it is still showing the old values. – John Rotenstein Jul 21 '17 at 05:02
  • The dbname , should be the name of database table which i have in red shift, right ? – vihaa_vrutti Jul 21 '17 at 05:02
  • No, as per my answer above, the dbname is the name of the database you entered in the "Database name" field when the cluster was created. It is *not* the name of a table. – John Rotenstein Jul 21 '17 at 05:03
  • Yes , i have changed to database name and username and password is changed to what i have for amazon redshift and now i have edited the question,though getting same error – vihaa_vrutti Jul 21 '17 at 05:07
  • Are you still receiving the error "Unknown host"? If you are using `redshifttest-xyz.cooqucvshoum.us-west-2.redshift.amazonaws.com` as your host, then the reason is that this DNS name does not resolve, probably because it does not exist. You should use the Endpoint listed in your Amazon Redshift console. You can test it by doing a PING against the name, which should return an IP address. – John Rotenstein Jul 21 '17 at 05:10
  • Actually, the correct host name is " host=redshifttest-icp.cooqucvshoum.us-west-2.redshift.amazonaws.com " and this do exist , i didnt posted correct host name first – vihaa_vrutti Jul 21 '17 at 05:12
  • Yes, that name works. Please update the Question to show your current error message. – John Rotenstein Jul 21 '17 at 05:14
  • 2
    Ah! It's the commas. I had my example wrong. See [The psycopg2 module content](http://initd.org/psycopg/docs/module.html) and there are two formats -- one as a long string with no commas, and another with comma-separated individual arguments. So, remove the commas from your string and see what happens. – John Rotenstein Jul 21 '17 at 05:17
  • Hello sir, i want to approach you as i am having some problem with AWS VPC. can you help me to resolve? – vihaa_vrutti Jul 21 '17 at 06:33
  • This appears to be a different topic, so please create a new Question. – John Rotenstein Jul 21 '17 at 06:40
  • I have posted that question – vihaa_vrutti Jul 21 '17 at 06:42
  • Can you tell me how to approach you – vihaa_vrutti Jul 21 '17 at 06:43
  • Thank you sir for your help – vihaa_vrutti Sep 14 '17 at 09:59
  • What if my redshift url is a jdbc connection string like: jdbc:redshift://..com:5439/?ssl=true&sslfactory=org.postgresql.ssl.NonValidatingFactory – Jwan622 Sep 09 '19 at 17:19
  • @Jwan622 Please create a new question rather than asking via a comment on an old question. – John Rotenstein Sep 09 '19 at 23:34
2

Old question but I just arrived here from Google.

The accepted answer doesn't work with SQLAlchemy, although it's powered by psycopg2:

sqlalchemy.exc.ArgumentError: Could not parse rfc1738 URL from string 'dbname=... host=... port=... user=... password=...'

What worked:

create_engine(f"postgresql://{REDSHIFT_USER}:{REDSHIFT_PASSWORD}@{REDSHIFT_HOST}:{REDSHIFT_PORT}/{REDSHIFT_DATABASE}")

Which works with psycopg2 directly too:

psycopg2.connect(f"postgresql://{REDSHIFT_USER}:{REDSHIFT_PASSWORD}@{REDSHIFT_HOST}:{REDSHIFT_PORT}/{REDSHIFT_DATABASE}")

Using the postgresql dialect works because Amazon Redshift is based on PostgreSQL.

Hope it can help other people!

GG.
  • 21,083
  • 14
  • 84
  • 130
0

To connect to redshift, you need the postgres+psycopg2 Install it as For Python 3.x:

pip3 install psycopg2-binary

And then use

return create_engine(
        "postgresql+psycopg2://%s:%s@%s:%s/%s"
        % (REDSHIFT_USERNAME, urlquote(REDSHIFT_PASSWORD), REDSHIFT_HOST, RED_SHIFT_PORT,
           REDSHIFT_DB,)
    )
-2

Well, for Redshift the idea is made COPY from S3, is faster than every different way, but here is some example to do it:

first you must install some dependencies

for linux users sudo apt-get install libpq-dev

for mac users brew install libpq

install with pip this dependencies pip3 install psycopg2-binary pip3 install sqlalchemy pip3 install sqlalchemy-redshift

import sqlalchemy as sa
from sqlalchemy.orm import sessionmaker


#>>>>>>>> MAKE CHANGES HERE <<<<<<<<<<<<<
DATABASE = "dwtest"
USER = "youruser"
PASSWORD = "yourpassword"
HOST = "dwtest.awsexample.com"
PORT = "5439"
SCHEMA = "public"

S3_FULL_PATH = 's3://yourbucket/category_pipe.txt'
ARN_CREDENTIALS = 'arn:aws:iam::YOURARN:YOURROLE'
REGION = 'us-east-1'

############ CONNECTING AND CREATING SESSIONS ############
connection_string = "redshift+psycopg2://%s:%s@%s:%s/%s" % (USER,PASSWORD,HOST,str(PORT),DATABASE)
engine = sa.create_engine(connection_string)
session = sessionmaker()
session.configure(bind=engine)
s = session()
SetPath = "SET search_path TO %s" % SCHEMA
s.execute(SetPath)
###########################################################



############ RUNNING COPY ############
copy_command = '''
copy category from '%s'
credentials 'aws_iam_role=%s'
delimiter '|' region '%s';
''' % (S3_FULL_PATH, ARN_CREDENTIALS, REGION)
s.execute(copy_command)
s.commit()
######################################



############ GETTING DATA ############
query = "SELECT * FROM category;"
rr = s.execute(query)
all_results =  rr.fetchall()

def pretty(all_results):
    for row in all_results :
        print("row start >>>>>>>>>>>>>>>>>>>>")
        for r in row :
            print(" ---- %s" % r)
        print("row end >>>>>>>>>>>>>>>>>>>>>>")

pretty(all_results)
s.close()
######################################
Paulo Victor
  • 3,814
  • 2
  • 26
  • 29
-3

The easiest way to query AWS Redshift from python is through this Jupyter extension - Jupyter Redshift

Not only can you query and save your results but also write them back to the database from within the notebook environment.

sat
  • 603
  • 3
  • 6