loading a pg_dump off of s3 into redshift

Question

I'm trying to load a complete database dump into Redshift. Is there a single command to restore the data from a pg_dump living on s3 into Redshift? If not, what are the best steps for tackling this?

Thanks

score 2 · Accepted Answer · answered Nov 30 '17 at 08:29

If you have a non compressed pg_dump this should be possible using a psql command (you may need to manually edit to get the right syntax, depending on your versions and options set). However this is a very inefficient and slow way to load redshift and I do not recommend it. If your tables are large it could take days or weeks!

What you need to do is this:

create target tables on redshift based upon the source table, but considering sort keys and distribution.
unload you postgres source tables into csv files using postgres "copy" command
If the source csv files are very big (e.g. more than say 100MB), consider splitting these into separate files as they will load faster (redshift will parallelize)
gzip the csv files (recommended but not essential)
upload these csv files to s3, with a separate folder per table
load the data into redshift from s3 by using the redshift copy command

loading a pg_dump off of s3 into redshift

1 Answers1