How data reduction operates

Content Software for File User Guide

Version
4.2.x
Audience
anonymous
Part Number
MK-HCSF000-03

Data reduction is a post-process activity. New data written to the cluster is written uncompressed. The data reduction process runs as a background task with lower priority than tasks serving user IO requests. The data reduction starts when enough data is written to the filesystems.

Data reduction tasks

Ingestion
  • Clusterization: Applied on data blocks at the 4K block level. The system identifies similarity across uncompressed data in all filesystems enabled for data reduction.
  • Compression: The system reads similar and unique blocks, compressing each type separately. Compressed data is then written to the filesystem.
Defragmentation
  • Uncompressed data related to successful compression is marked for deletion.
  • The defrag process waits for sufficient blocks to be invalidated and then permanently deletes them.