Data reduction is a post-process activity. New data written to the cluster is written uncompressed. The data reduction process runs as a background task with lower priority than tasks serving user IO requests. The data reduction starts when enough data is written to the filesystems.
Data reduction tasks
- Ingestion
-
- Clusterization: Applied on data blocks at the 4K block level. The system identifies similarity across uncompressed data in all filesystems enabled for data reduction.
- Compression: The system reads similar and unique blocks, compressing each type separately. Compressed data is then written to the filesystem.