Why does vacuum reindex on an already sorted table take forever?

Question

I have a stuck 'vacuum reindex' operation and am wondering what may be the cause for it taking such a long time.

I recently changed the schema of one of my Redshift tables, by creating a new table with the revised schema and deep copying the data using 'select into' (see Performing a Deep Copy). My basic understanding was that after deep copying the table, the data should be sorted according to the table's sort-keys. The table has an interleaved 4-column sort-key. Just to make sure, after deep copying I ran the 'interleaved skew' query (see Deciding When to Reindex), and the results were 1.0 for all columns, meaning no skew.

I then ran 'vacuum reindex' on the table, which should be really quick since the data is already sorted. However the vacuum is still running after 30 hours. During the vacuum I examined svv_vacuum_progress periodically to check the vacuum operation status. The 'sort' phase finished after ~6 hours but now the 'merge' phase is stuck in 'increment 23' for >12 hours.

What could be the cause for the long vacuum operation, given that the data is supposed to be already sorted by the deep copy operation? Am I to expect these times for future vacuum operations too? The table contains ~3.5 billion rows and its total size is ~200 GB.

If your interleaved skew is 1.0 why do you even want to REINDEX ? From the documentation "If the skew is greater than 1.4, a VACUUM REINDEX will usually improve performance unless the skew is inherent in the underlying set." ? — Raniendu Singh, Jul 24 '15 at 06:41
Indeed, however my concern is that in the future, when I will have to REINDEX regularly to avoid skew, won't the VACUUM time be at least as long as REINDEXing now on an already sorted table? — matangover, Jul 24 '15 at 13:49

Why does vacuum reindex on an already sorted table take forever?

0 Answers0