1

I want to load data into an Amazon Redshift cluster using a boto3 Python script.

I want to create a script using boto3 python to do the following:

  1. Create a cluster
  2. Load data into the cluster
  3. Create a report on the performance on the cluster

I see in boto3 there are no methods available to load the data into the cluster. Maybe from a flat-file or from S3.

How can I load the data into the cluster using boto3 or any other python package?

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
ML2019
  • 71
  • 8

1 Answers1

1

1. Create an Amazon Redshift Cluster

Call the create_cluster() command.

2. Load data into the cluster

Amazon Redshift runs like a normal PostgreSQL v8.0.2 database. To run commands on the database itself (including the COPY command), you should establish a JDBC/ODBC connection to the database.

See: Connecting to an Amazon Redshift Cluster Using SQL Client Tools - Amazon Redshift

A common method is to use psycopg2:

conn = psycopg2.connect(...)
cur = conn.cursor()
cur.execute("COPY...")
conn.commit()

See: Copying data from S3 to AWS redshift using python and psycopg2

3. Create a report on the performance on the cluster

There are two sources of information for performance reporting:

See: Monitoring Amazon Redshift Cluster Performance - Amazon Redshift

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470