0

I have a scenario where I'm fetching data from one database(postgres) and loading the data into a table in a different database(Redshift)

Is there anyway in Kettle to schedule this job ?

Its a simple insert into redshift select * from postgres

Rishu Shrivastava
  • 3,745
  • 1
  • 20
  • 41
FirstName
  • 377
  • 2
  • 6
  • 21
  • Table Input(connection to postgres) - > Table Output(connection to redshift). But adjust data types in between if there is need. – simar Aug 09 '16 at 13:50
  • Get jdbc driver for amazon redshift and copy to $KETTLE_HOME/lib – simar Aug 09 '16 at 13:51

1 Answers1

1

Using a Table Output step can be painfully slow as Redshift is optimized for bulk inserts, not row-by-row inserts. AFAIK, there are no steps/plugins in Kettle for bulk inserts into Redshift. What you can do, is to make a script in a Shell step that:

  1. dumps data from Postgres to file
  2. copies the data to S3: https://anotherreeshu.wordpress.com/2015/11/30/loading-data-to-aws-s3-bucket-pentaho-data-integration/
  3. inserts the data from S3 to Redshift: https://anotherreeshu.wordpress.com/2015/12/11/loading-data-from-s3-to-redshift-pentaho-data-integration/
matthiash
  • 3,105
  • 3
  • 23
  • 34