2

I have started researching into Redshift. It is defined as a "Database" service in AWS. From what I have learnt so far, we can create tables and ingest data from S3 or from external sources like Hive into Redhshift database (cluster). Also, we can use JDBC connection to query these tables.

My questions are -

  1. Is there a place within Redshift cluster where we can store our queries run it periodically (like Daily)?

  2. Can we store our query in a S3 location and use that to create output to another S3 location?

  3. Can we load a DB2 table unload file with a mixture of binary and string fields to Redshift directly, or do we need a intermediate process to make the data into something like a CSV?

I have done some Googling about this. If you have link to resources, that will be very helpful. Thank you.

Jaygovind Sahu
  • 123
  • 1
  • 1
  • 7
  • https://docs.aws.amazon.com/redshift/latest/dg/r_UNLOAD.html – John Hanley Mar 21 '18 at 17:41
  • @JohnHanley Thanks for the response. I understand that the results from a query in Redshift can be unloaded into S3. My questions are more around where do we actually store the queries (can we store it in any of the AWS services like S3 or the Redshift cluster itself), not the results. Thank you. – Jaygovind Sahu Mar 21 '18 at 18:05

1 Answers1

0

I used cursor method using psycopg2 function in python. The sample code is given below. You have to set all the redshift credentials in env_vars files. you can set your queries using cursor.execute. here I mension one update query so you can set your query in this place (you can set multiple queries). After that you have to set this python file into crontab or any other autorun application for running your queries periodically.

import psycopg2
import sys
import env_vars

conn_string = "dbname=%s  port=%s  user=%s  password=%s  host=%s " %(env_vars.RedshiftVariables.REDSHIFT_DW ,env_vars.RedshiftVariables.REDSHIFT_PORT ,env_vars.RedshiftVariables.REDSHIFT_USERNAME ,env_vars.RedshiftVariables.REDSHIFT_PASSWORD,env_vars.RedshiftVariables.REDSHIFT_HOST)
conn = psycopg2.connect(conn_string);
cursor = conn.cursor();
cursor.execute("""UPDATE database.demo_table SET  Device_id = '123' where Device = 'IPHONE' or Device = 'Apple'; """);

conn.commit();
conn.close();
Sahil Desai
  • 3,418
  • 4
  • 20
  • 41
  • no worries if you are not familier with python you have to just set the credentials and just put your queries in `cursor.execute` rest of code remains same. – Sahil Desai Mar 22 '18 at 06:03