1

Dedup size picture

Hi all,
When I setup Windows depuplication, I get great savings. See the attached picture. 1.28 TB deduped down to 23 GB. However, the space savings are not real.

Lets say that volume is 2TB. According to the screenshot, there is only 28 GB used, so I should be able to add another 1.9TB of data. In reality, once I add ~800GB of data (to hit the real 2TB limit) the volume fills. The attached picture would then say data size 2TB, size on disk 50GB (or something like that).

What is the point of data deduplication if I cannot utilize the space savings? Or, is there some trick to setting this up I'm not seeing. I've had these same results on server 2012r2 and 2016. I've tried on the HyperV level with VHDXs, and on a backup server with large backup files.

GeoffM
  • 75
  • 8
  • Please post the output of the PowerShell cmdlet: Get-DedupStatus | Select * The File Explorer view you posted does not show the amount of free space on the volume, just the size on disk of the files you selected. This may not contain all the files on the volume, even if you selected the drive, because not all files would be visible to be selected. Thanks, Will Gries Program Manager, Data Deduplication Microsoft – Will Gries Oct 30 '17 at 17:47
  • ObjectId : \\?\Volume{46a06222-de20-4266-a6c4-cdc2f06621b2}\ Capacity : 15000036962304 FreeSpace : 13398178594816 InPolicyFilesCount: 260 InPolicyFilesSize : 914716578153 LastGarbageCollectionResult : 0 LastGarbageCollectionResultMessage : The operation completed successfully. LastScrubbingTime : 10/28/2017 9:36:27 AM OptimizedFilesCount : 262 OptimizedFilesSavingsRate : 2 OptimizedFilesSize : 1509620257129 SavedSpace : 30667373364 – GeoffM Oct 30 '17 at 18:13
  • I added as much as I could before the limit kicked in. – GeoffM Oct 30 '17 at 18:14
  • Hmm... these numbers don't seem to add up to the numbers you provided above... 13.6 TiB capacity and 12.18 TiB free space, 28.56 GiB saved space. Unfortunately, you did not post used space. I cannot explain from the information provided thus far why you cannot fill up the volume. Do you get an error message when you try to create a new large file? – Will Gries Oct 30 '17 at 18:25
  • My example of a 2TB drive was for a different server. It gave too many errors so I disabled dedup. The screen post above was from the last server that still has it enabled, which is a high capacity backup server. The errors from the previous server was simply related to putting too much data on the server, hence my confusion as to what dedup actually did. – GeoffM Oct 30 '17 at 18:31
  • I would love to get to the bottom of what errors you're seeing on your other server, but I'm not sure how we do that if you have Dedup turned off for that server. My gut reaction from reading this post is that this is a configuration issue - would it be possible to re-enable on your 2 TiB volume and share out error messages? – Will Gries Oct 30 '17 at 18:34

1 Answers1

0

The space savings are real. You can check the dedup ratio in Server Manager. Upgrading to 2012R2 and using deduplication has saved us from buying new storage for at least a year.

Don't try to add the logical file sizes and compare them with "Size on Disk" - the latter is only the sum of not deduplicated files/parts. Deduplicated data doesn't show up at all here. The free space of the volume can't be calculated that way, check properties of drive/volume.

This makes sense - when two files have 80% in common, what size on disk does each file occupy?

The background is that deduplicated parts (or whole files) are stored in the deduplication pool. The non-deduplicated parts of a file are stored as a sparse file with reparse points where the deduplicated parts are to be.

The dedup pool hides inside the System Volume Information folder - if you check its size you know what amount of deduplicated/compressed is stored on the drive. Add that to Size on Disk of your files and you'll be pretty close to actual volume utilization.

There's a pretty good primer from MS for this: https://docs.microsoft.com/en-us/windows-server/storage/data-deduplication/understand

Zac67
  • 10,320
  • 2
  • 12
  • 32
  • I've actually read the entire guide for new features for 2016. Data Dedup seems really neat, I just can't get it implemented (or maybe I just don't understand). Lets say I have 2 TB in PDF, and they're 80% in common. So, that should yield 400 GB. Shouldn't I be able to put 1.6TB more on a 2TB drive? – GeoffM Oct 30 '17 at 18:17
  • When you've got 1000 files of 1 GB each (1 TB total file size) and they all have 80% in common, you need .2*1*1000 = 200 GB for the unique parts ("Size on Disk") and .8*1*1 = .8 GB for the dedup pool (plus file system overhead). – Zac67 Oct 30 '17 at 18:41