There are basically two options for connecting to Amazon Redshift using Python.
Option 1: JDBC Connection
This is a traditional connection to a database. The popular choice tends to be using psycopg2
to establish the connection, since Amazon Redshift resembles a PostgreSQL database. You can download specific JDBC drivers for Redshift.
This connection would require the Redshift database to be accessible to the computer making the query, and the Security Group would need to permit access on port 5439. If you are trying to connect from a computer on the Internet, the database would need to be in a Public Subnet and set to Publicly Accessible = Yes
.
See: Establish a Python Redshift Connection: A Comprehensive Guide - Learn | Hevo
Option 2: Redshift Data API
You can directly query an Amazon Redshift database by using the Boto3 library for Python, including an execute_statement()
call to query data and a get_statement_result()
call to retrieve the results. This also works with IAM authentication rather than having to create additional 'database users'.
There is no need to configure Security Groups for this method, since the request is made to AWS (on the Internet). It also works with Redshift databases that are in private subnets.