Redshift query results incorrect before VACUUM

Question

When Redshift uses an index to run a query (say counts), does it exclude counting the rows in the unsorted region?

I had copied a lot a data using the COPY command, but did not VACUUM the table afterwords. On running my query (involving joins with several tables), the results of the query were wrong - the newly copied rows in the unsorted region weren't counted.

Then, after vacuuming the table, the query began returning the correct results. Is this expected behavior, or is this a bug introduced by Amazon?

Amazon Redshift does not use indexes. – John Rotenstein Sep 30 '17 at 03:16 — John Rotenstein, Sep 30 '17 at 03:16

score 1 · Answer 1 · answered Sep 30 '17 at 01:41

Vacuuming won't have any effect on COPYed rows, which are effectively inserts. Vacuum physically deletes rows previously deleted with an SQL delete statement, which only marks the rows as deleted, so they don't participate in subsequent queries but still consume disk space.

Redshift is an eventually consistent database, so even if your COPY command had completed, the rows may not yet be visible to queries.

Running vacuum is basically a defrag, which requires all rows to be reorganized. This (probably) causes the table to be brought into a consistent state, ie all rows are visible to queries.

Redshift query results incorrect before VACUUM

1 Answers1