0

we need to remove old data from a huge table once a year. The rows of the table are rather compact (around 40 bytes) and there is just a clustered index on the table.

The database is about 750 GB in total, and the table worked on is 640 GB in size and contains 8.7 billion rows before cleaning. After removing, only 3.7 billion rows remain, the size drops to about 500 GB for data.

These numbers look odd, but they are fine: each page had some rows removed. Some pages were emptied and dropped, some were unchanged and are still 100% full, but most pages are partly filled now and there is a lot of unclaimed space on each page.

To reclaim this space, I need to rebuild the index. My question is: How to defrag an index in a database that is about the size of the index itself?

If I remember correctly, an INDEX DEFRAG REBUILD will require as free space 1.3x the size it works on, as it copies the data in a sorted way. The DB would grow by almost 1 TB, and this new space will not be needed once the defrag finishes.

Shrinking back after the defrag is not helpful as it introduces new (heavy) fragmentation.

I am aware of the "SORT_IN_TEMPDB" setting. Is there an estimate how much free space in the DB will be required with this setting?

As an alternative, I could drop & recreate the clustered index, but I am unsure what the space requirement for that operation is.

Reorganising the index does not reclaim space on each page (?), so this operation is also not what I want.

Thanks for any ideas! Ralf

Ralf
  • 538
  • 1
  • 6
  • 17

1 Answers1

-2

we need to remove old data from a huge table once a year.

That is exactlyx the use case of partitioning. Partition per year, drop subtable, done. Downtime? Millisecond if you do it smart.

TomTom
  • 61,059
  • 10
  • 88
  • 148
  • Thanks - but... I left out some crucial information making partitioning hard for this case. The database is a time series storage, but fully versioned. That means, you got 2 timestamps for each data point: a date for which this value is, and a from-to-date-range for when this was the most recent value know for that day. Cleaning means removing data for days no in scope anymore (business) and removing data history (pure technical). Your clustering key would need to factor in both aspects of "old" data. – Ralf Nov 05 '18 at 09:21
  • Question asked was: Space overhead for rebuild vs. recreate. Answer contains no information at all about the space requirements of rebuild/reorganize operations. – Ralf Nov 05 '18 at 09:30