In Redshift, is it possible to pause/resume a VACUUM FULL job?

Question

I'm working with on an exceptionally large table which due to some data issue, I have to re-insert data on a couple of historical dates. After the insertion, I wanted to perform a manually triggered VACUUM FULL operation. However, unfortunately, the VACUUM FULL operation on that table takes more than several days to complete. Since, in Redshift, only one VACUUM operation can happen at a time, that also means that other smaller tables will not be able to perform their daily VACUUM operation until that large table is done with its VACUUM operation.

My question is, is there a way to pause a VACUUM operation on that large table to give some room for VACUUM-ing the smaller tables? Will terminating a VACUUM operation resets the operation or will re-running the VACUUM command able to resume the operation from the last successful state?

Sorry, I'm trying to learn more about how the VACUUM process works in Redshift but I am not able to find too much info on it. Would be really appreciated to have some explanation/docs in your answer as well.

Note: I did tried to perform a deep copy as mentioned in the official docs. However, the table is too large to copy on one go so it's not an option.

Thanks!

score 2 · Answer 1 · answered Jan 28 '23 at 02:27

As far as I know there is no way to pause a (manual) vacuum. However, vacuum runs in "passes" where some vacuum work is done and partial results are committed. This allows for ongoing work to progress while the vacuum is running. If you terminate the vacuum midway the previously committed blocks will save the partial work. Some work will be lost for the current pass and the restarted vacuum will need to scan the entire table to figure out where to start. Last I knew this will work but you will lose some progress with each terminate.

If you manage the update / consistency issues a deep copy can be a faster way to go. You don't have to do this in a single pass - you can do it in parts. You need the space to store the second version of the table but you won't need the space to sort the whole table in one go. For example if you have a table, let's say with 10 years of data, you can insert the first year into the new table (sorted of course). Then the second and so on. There may be some partial empty blocks at the boundaries but these are easy to fix up with a delete only vacuum (or just wait for auto-vacuum to do it).

If you can do it the deep copy method will be faster as it doesn't need to keep consistency or play nice with other workloads.

In Redshift, is it possible to pause/resume a VACUUM FULL job?

1 Answers1