12

In Amazon Redshift's Getting Started Guide, it's mentioned that you can utilize SQL client tools that are compatible with PostgreSQL to connect to your Amazon Redshift Cluster.

In the tutorial, they utilize SQL Workbench/J client, but I'd like to utilize python (in particular SQLAlchemy). I've found a related question, but the issue is that it does not go into the detail or the python script that connects to the Redshift Cluster.

I've been able to connect to the cluster via SQL Workbench/J, since I have the JDBC URL, as well as my username and password, but I'm not sure how to connect with SQLAlchemy.

Based on this documentation, I've tried the following:

from sqlalchemy import create_engine
engine = create_engine('jdbc:redshift://shippy.cx6x1vnxlk55.us-west-2.redshift.amazonaws.com:5439/shippy')

ERROR:

Could not parse rfc1738 URL from string 'jdbc:redshift://shippy.cx6x1vnxlk55.us-west-2.redshift.amazonaws.com:5439/shippy'
Community
  • 1
  • 1
Chris
  • 5,444
  • 16
  • 63
  • 119
  • 1
    Have you tried using the Postgres engine? – kylieCatt Jan 26 '16 at 00:26
  • Expanding on the above comment, in your connection string you're using `jdbc:redshift:`, but that means it's trying to connect to the redshift endpoint, not the postgres adaptor for you redshift DB. I don't know if redshift gives you a different connection endpoint (maybe it's the same hostname but a different port)? – Tom Dalton Jan 26 '16 at 00:51
  • Have you looked at https://sqlalchemy-redshift.readthedocs.org/en/latest/? – van Jan 26 '16 at 05:05

5 Answers5

8

I don't think SQL Alchemy "natively" knows about Redshift. You need to change the JDBC "URL" string to use postgres.

jdbc:postgres://shippy.cx6x1vnxlk55.us-west-2.redshift.amazonaws.com:5439/shippy

Alternatively, you may want to try using sqlalchemy-redshift using the instructions they provide.

Joe Harris
  • 13,671
  • 4
  • 47
  • 54
  • I tried sqlalchemy-redshift but I get the error "pkg_resources.DistributionNotFound: The 'psycopg2>=2.5' distribution was not found and is required by the application," which I tried to solve by pip install Psycopg, which gives me the error "Please add the directory containing pg_config to the PATH or specify the full executable path with the option python setup.py build_ext --pg-config /path/to/pg_config build ... or with the pg_config option in 'setup.cfg'." This is where I'm stuck with sqlalchemy-redshift. – Chris Jan 30 '16 at 15:41
  • 5
    As a result, I tried create_engine('jdbc:postgres://shippy.cx6x1vnxlk55.us-west-2.redshift.amazonaws.com:5439/shippy'), but I get the following error: "sqlalchemy.exc.ArgumentError: Could not parse rfc1738 URL from string 'jdbc:postgres://shippy.cx6x1vnxlk55.us-west-2.redshift.amazonaws.com:5439/shippy'" – Chris Jan 30 '16 at 15:42
8

I was running into the exact same issue, and then I remembered to include my Redshift credentials:

eng = create_engine('postgresql://[LOGIN]:[PASSWORD]@shippy.cx6x1vnxlk55.us-west-2.redshift.amazonaws.com:5439/shippy')
MDan
  • 295
  • 2
  • 13
Jasper Croome
  • 93
  • 1
  • 6
2

sqlalchemy-redshift is works for me, but after few days of reserch packages (python3.4):

SQLAlchemy==1.0.14 sqlalchemy-redshift==0.5.0 psycopg2==2.6.2

First of all, I checked, that my query is working workbench (http://www.sql-workbench.net), then I force it work in sqlalchemy (this https://stackoverflow.com/a/33438115/2837890 helps to know that auto_commit or session.commit() must be):

db_credentials = (
'redshift+psycopg2://{p[redshift_user]}:{p[redshift_password]}@{p[redshift_host]}:{p[redshift_port]}/{p[redshift_database]}'
    .format(p=config['Amazon_Redshift_parameters']))
engine = create_engine(db_credentials, connect_args={'sslmode': 'prefer'})
connection = engine.connect()
result = connection.execute(text(
    "COPY assets FROM 's3://xx/xx/hello.csv' WITH CREDENTIALS "
    "'aws_access_key_id=xxx_id;aws_secret_access_key=xxx'"
    " FORMAT csv DELIMITER ',' IGNOREHEADER 1 ENCODING UTF8;").execution_options(autocommit=True))
result = connection.execute("select * from assets;")
print(result, type(result))
print(result.rowcount)
connection.close()

And after that, I forced to work sqlalchemy_redshift CopyCommand perhaps bad way, looks little tricky:

import sqlalchemy as sa
tbl2 = sa.Table(TableAssets, sa.MetaData())
copy = dialect_rs.CopyCommand(
    assets,
    data_location='s3://xx/xx/hello.csv',
    access_key_id=access_key_id,
    secret_access_key=secret_access_key,
    truncate_columns=True,
    delimiter=',',
    format='CSV',
    ignore_header=1,
    # empty_as_null=True,
    # blanks_as_null=True,
)

print(str(copy.compile(dialect=RedshiftDialect(), compile_kwargs={'literal_binds': True})))
print(dir(copy))
connection = engine.connect()
connection.execute(copy.execution_options(autocommit=True))
connection.close()

We make just that I made with sqlalchemy, excute query, except comine query by CopyCommand. I have not see some profit :(.

Community
  • 1
  • 1
Anshik
  • 633
  • 8
  • 15
1

The following works for me with Databricks on all kinds of SQLs

  import sqlalchemy as SA
  import psycopg2
  host = 'your_host_url'
  username = 'your_user'
  password = 'your_passw'
  port = 5439
  url = "{d}+{driver}://{u}:{p}@{h}:{port}/{db}".\
            format(d="redshift",
            driver='psycopg2',
            u=username,
            p=password,
            h=host,
            port=port,
            db=db)
  engine = SA.create_engine(url)
  cnn = engine.connect()

  strSQL = "your_SQL ..."
  try:
      cnn.execute(strSQL)
  except:
      raise
Jie
  • 1,107
  • 1
  • 14
  • 18
0
import sqlalchemy as db
engine = db.create_engine('postgres://username:password@url:5439/db_name')

This worked for me

Achilleus
  • 1,824
  • 3
  • 20
  • 40