Pentaho Kettle - Retrieving Data from different database

Question

I have a scenario where I'm fetching data from one database(postgres) and loading the data into a table in a different database(Redshift)

Is there anyway in Kettle to schedule this job ?

Its a simple insert into redshift select * from postgres

Table Input(connection to postgres) - > Table Output(connection to redshift). But adjust data types in between if there is need. — simar, Aug 09 '16 at 13:50
Get jdbc driver for amazon redshift and copy to $KETTLE_HOME/lib — simar, Aug 09 '16 at 13:51

matthiash · Answer 1 · 2016-08-10T07:55:07.540

Using a Table Output step can be painfully slow as Redshift is optimized for bulk inserts, not row-by-row inserts. AFAIK, there are no steps/plugins in Kettle for bulk inserts into Redshift. What you can do, is to make a script in a Shell step that:

dumps data from Postgres to file
copies the data to S3: https://anotherreeshu.wordpress.com/2015/11/30/loading-data-to-aws-s3-bucket-pentaho-data-integration/
inserts the data from S3 to Redshift: https://anotherreeshu.wordpress.com/2015/12/11/loading-data-from-s3-to-redshift-pentaho-data-integration/

Pentaho Kettle - Retrieving Data from different database

1 Answers1