Calculating deduplication space savings

File Services Administration Guide for Hitachi NAS Platform

Version
14.7.x
14.6.x
Audience
anonymous
Part Number
MK-92HNAS006-29

The following example describes how the deduplication space savings are calculated. The calculation is only valid for file systems with disabled file system packing feature.

If the difference between physical and logical space of 100 TB of data before deduplication is as follows:
  • Group A: 30 TB of distinct data
  • Group B: 70 TB of duplicated data that contains only 10 TB of unique data blocks.
    • Given an arbitrary data block in Group B, there may be one or more identical data blocks in Group B, and not in Group A, but an arbitrary data block in Group A has no identical data block in either groups.
If both Group A and Group B have gone through the dedupe process:
  • Group A had no duplicates removed and consumed the same 30 TB.
  • Group B had duplicates removed and consumed only 10 TB to hold the unique data blocks.
  • Group B (70 TB) = {Group C (10 TB raw remaining)} + {Group D (60 TB deduped and now sharing or pointing to physical blocks of group C)}
  • The original 100 TB of data now requires only 40 TB (30 plus 10) of physical blocks because all duplicates were removed. However, the logical data size is 100 TB (30 plus 70), which is the amount of space needed if the data were not deduped. The results are outlined in the following table:
    Used Space The amount of physical disk space used by the file system, in this example, group A and group C = 30 + 10 = 40 TB
    Deduped space The amount of duplicate data that does not occupy its own physical disk space, but has been deduped to share existing physical blocks = group D = 60 TB
    Logical space The amount of physical disk space that would be required if the data were not deduped = {used space} + {deduped space} = 40 + 60 = 100 TB

Based on the example presented, the dedupe percentage gives the amount of physical disk space saved by removing the duplicates. The percentage measures against the amount of space that would be required if the data were not deduped.