How data is stored with the capacity saving function

Provisioning Guide for VSP One Block

Version
10.2.x
Audience
anonymous
Part Number
MK-23VSP1B012-00

This figure shows how data is stored by using the capacity saving function.



When the post-process mode is applied, data received by the storage controller is stored in a temporary area in the pool. When the data is classified as inactive (five minutes since the last update for Dynamic Provisioning), the capacity saving processing is performed, and the post-process data is stored in the data storage area. When post-process data is updated again, the data stored in the data storage area is no longer required. This kind of data is called garbage data. The used capacity of the pool increases until garbage collection, which collects old data that is no longer required. The pool capacity that is eventually required is the sum of the physical data capacity after capacity saving plus the amount of metadata.

The temporary area and the data storage area are not assigned fixed capacities. They share the pool and use the pool as needed.
  • The temporary area and the data storage area are not assigned fixed capacities. They share the pool and use the pool as needed.
  • The temporary area is used only when the post-process mode is applied. When the inline mode is applied, capacity saving processing is performed simultaneously with the receipt of data from the host, and host data is not stored in the temporary area.
  • When the capacity saving function is enabled, the garbage data is created during the following processing and consumes the pool capacity:
    • Update the data on a DP-VOL with compression set.
    • Update the data that is included on a DP-VOL with deduplication and compression set but not included on a different DP-VOL.
    • Delete a Thin Image Advanced pair.

The capacity overhead associated with the capacity saving function includes these items:

Capacity consumed by metadata
The capacity consumed by metadata for the capacity saving function (deduplication and compression) is approximately 3% of the consumed DP-VOL capacity that has been processed by capacity saving. For example, if the consumed capacity of a DP-VOL is 150 TB and the capacity saving feature has processed 100 TB of the 150 TB consumed capacity and reduced it to 30 TB, the capacity consumed by metadata for the capacity saving function is approximately 3 TB (3% of 100 TB). The total consumed capacity of this DP-VOL at this instant is 83 TB (30 TB + 50 TB + 3 TB).
Capacity consumed by garbage (invalid) data
The capacity consumed by garbage data is approximately 7% of the total consumed capacity of all DP-VOLs with capacity saving enabled. The capacity is dynamically consumed based on garbage data created by the capacity saving process and cleaned by the background garbage collection process. Garbage collection is a background process with a lower priority than host I/O, so the capacity consumed by garbage data depends on both the garbage created and the host I/O rate.

For a DRS-VOL, the total capacity consumed by metadata and garbage data is about 13% of the pool capacity. The pool capacity is dynamically consumed based on the data reduction processing usage.

During periods of high write activity from the host, this capacity might increase over 13% temporarily, and then it returns to around 13% when host write activity decreases. For a DRD-VOL with capacity saving enabled by using dedupe and compression, the metadata and garbage data consume about 10% of the pool capacity.

When the free space in a pool becomes 1% or 120 GB or less, capacity deletion processing might stop, or performance might degrade. This problem continues until the free capacity percentage increases to 1% or more, and the free capacity increases to 240 GB or more.

Capacity saving processing for existing data

The deduplication and compression processing is performed asynchronously for pages that store data, and the free area of the pool can be increased, thereby reducing the cost of purchasing drives over time.


applying capacity saving

Capacity saving processing for new write data

The capacity saving mode of a DP-VOL (post-process mode or inline mode) determines how capacity saving is applied to new write data from the host:

Inline mode (default)
When you apply capacity saving with the inline mode to a DP-VOL, the compression and deduplication processing are performed synchronously for new write data. The inline mode minimizes the pool capacity required to store new write data but can impact I/O performance more than the post-process mode. The inline mode should be applied when writing data with sequential I/Os, for example, when writing data to target volumes of data migration or secondary volumes of copy pairs. When the data migration or copy pair creation has completed, the mode should be changed from the inline mode to the post-process mode to minimize the impact on I/O performance.
If you want to change the default mode (inline) to post-process mode, you must use CCI (raidcom add ldev [-capacity_saving_mode <saving mode>] or raidcom modify ldev [-capacity_saving_mode <saving mode>]).
Post-process mode
When you apply capacity saving with the post-process mode to a DP-VOL, the compression and deduplication processing are performed asynchronously for new write data. Since capacity saving processing is not performed at the time the new data is written, the post-process mode can reduce the impact of capacity saving processing on I/O performance. However, pool capacity is required to store the new write data until the capacity saving processing is performed.

This example shows how the pool used capacity changes over time when performing data migration. The red line shows the capacity when the post-process mode is applied, and the black line shows the capacity when the inline mode is applied. This example assumes that the writing speed (GB/h) for the new data is faster than the initial capacity saving processing (GB/h).


change in pool used capacity over time for inline and post-process modes

When inline mode is applied, capacity saving processing is performed synchronously for the writing of data. When post-process mode is applied, capacity saving processing is performed asynchronously for the writing of data, and a temporary storage area is required for the write data. The capacity required for the temporary storage area depends on the writing speed of the new data or on the frequency of data updates during migration.

This table shows the processing method (synchronous or asynchronous) for initial data, new write data, and updated data. For new write data, capacity saving processing is performed at different times for post-process mode and inline mode.

Mode Initial data1 New write data Updated write data
Compression processing Deduplication processing Compression processing Deduplication processing
Post-process mode Asynchronous Synchronous4 Asynchronous Synchronous3 Asynchronous
Inline mode Asynchronous Synchronous Synchronous2 Synchronous3 Asynchronous

Notes

  1. The initial data is the existing data on the DP-VOL when the capacity saving function is enabled. The (initial) capacity saving processing is performed for the initial data.
  2. Applied to sequential I/O data, such as writing large amounts of data sequentially. Deduplication of random I/O data, such as updating files irregularly, is performed in post-process mode.
  3. Indicates the compression method of the written data when compressed data is updated. If uncompressed data before initial capacity deletion is updated, compression of the written data is performed in post-process mode.
  4. For a DRS-VOL, the compression processing is always performed synchronously even if the capacity saving mode is post-process mode. For a DRD-VOL with dedupe and compression that does not include the DRS-VOL, the compression processing is performed asynchronously.