The goal is to unload a few tables (for each customer) every few hours to s3 in parquet format
Each table is around 1GB (CSV format), in parquet it is around 120MB
The issue is when running 2-3 parallel unloads commands the cpu of the redshift nodes goes to 98%-100% in the cluster.
This is the command I use:
unload ('select * from my_table')
to 's3://test_bucket'
iam_role 'my_arn'
allowoverwrite
format PARQUET;
I also tried to unload the data as CSV but the nodes also went to around 98-100% CPU.
I would like to save a snapshot of the whole table (a few tables) every few hours and I must have all the columns so I have to use select *
.
Is there another approach I might take that would not raise my redshift CPU so high?
The node type is ra3.xlplus
with 4 nodes.