0

I have a Redshift cluster that consists of 2 nodes with 160 Gb disks. I'm randomly getting "Disk full" error when running vacuum or any other query. My disk usage is 92%. I did delete more than a half of the old rows in table that is 10515 Mb in size, but even after rebooting the cluster there's no effect and table still of the same size, though count shows new number of rows. I should have a seen at lease small decrease in disk usage, but there's nothing.

Does anyone has any clues what it might be? Is deleting table in this case is the only option?

viktor117
  • 21
  • 4

1 Answers1

1

There are a few possibilities here but first let me check the facts. You have a 2 node dc2.large cluster and it is 92% disk full. This is too full and needs to lowered to be lowered to provide temp space for query execution. You have a table that is 10515 blocks in size. To address the disk space concern you deleted 1/2 of the rows in the table in question and then vacuumed the table. Once complete you didn't see any change to the cluster space nor the size of the table, not one block difference in table size. Do I have this correct?

First possibility is that the vacuum did not complete correctly. You mention that you are getting disk full messages even when vacuuming. So could it be that the vacuum you tried is not completing? You see vacuum need temp space to sort the table data and if you have a cluster that has gotten too full then the vacuum could fail. In this case you can run a delete-only vacuum that will not attempt to sort the table, just reclaim disk space. This will have a higher likelihood of success in a disk full situation.

Another possibility is that the delete of rows didn't complete correctly or wasn't committed before the vacuum was run. This will cause the vacuum to run on the full set of rows.

It is also possible that the table in question is very wide (many columns). This matters because of how Redshift stores data - each block is 1MB in size and each column needs a block for its data. This cluster has 4 slices and if this table is 1,500 columns wide (yes, that is silly wide) the table will take up 6,000 blocks to just store the first 4 rows. Then it takes no additional disk space to add rows until these blocks start to fill up. The table size will move in very large chunks and when removing rows the size may not change except in large chunks. This is unlikely to be what is happening if you are seeing EXACTLY the same number of blocks but if you are just seeing changes in blocks that are less than you expect this could be in play.

There could be some some other misunderstanding happening. A sort-only vacuum won't free up space. The node type isn't what I think it is. The table could live in S3 and be access through spectrum. But based on the description these don't seem likely.

UNSOLICITED ADVICE: You are on the right track by freeing up disk space but you need to take more action than reducing this one table. (I expect you realize this and this is just a start.) You should be operating below 70% disk full in most cases - this varies by workload and table sizes but is a good general rule. This means reducing a great deal of data on your disks or increasing your node count (and cost). Migrating some data to S3 and using Spectrum to access could be an option. If you need more storage w/o more compute you can look at the storage optimized nodes but since you are at the very smallest end of Redshift these likely aren't a win for you. You need to 1) remove unneeded data, 2) move some data to S3 and use Spectrum, or 3) add a node you your cluster.

Bill Weiner
  • 8,835
  • 2
  • 7
  • 18
  • Thank you Bill Weiner You got everything correctly. After posting message I went cleaning other tables and then the result was visible and what I've expected. Original table that was 10515 blocks got actually reduced alongside with others that I've cleaned. Some other big tables still show thousands of unsorted mbytes, so more cleanup needed. + options 2 and 3 are what I'll do next – viktor117 Mar 30 '21 at 07:10