Estimated time before completion of Rebuild

Virtual Storage Platform One SDS Block Storage Administrator Guide

Version
1.17.x
Audience
anonymous
Part Number
MK-24VSP1SDS002-04

The estimated time [min] before completion of Rebuild can be calculated as follows:

Data capacity to be rebuilt [TiB] × unit processing time for Rebuild [min]

The following table shows the data capacities to be rebuilt.

Category

Data capacities to be rebuilt

Normal Rebuild

For a drive failure

Physical capacity of the drive in which a failure occurred

For storage node maintenance recovery

  • When the rebuild capacity policy is "Variable"

    Physical capacity of the storage node to be recovered

  • When the rebuild capacity policy is "Fixed"

    Capacity obtained by subtracting the allocated rebuild capacity from the physical capacity on the storage node to be recovered

Here, the rebuild capacity is derived as follows:

  1. Verify the number of rebuildable drives (numberOfDrives) of the rebuildable resources (rebuildableResources) of the storage node to be recovered.

  2. Select as many drives as numberOfDrives on the storage node to be recovered, in descending order of capacity.

  3. The total physical capacity of the selected drives is the rebuild capacity.

    For details about the rebuild capacity, see Rebuild capacity of a storage pool in this document.

Sometimes, the number of rebuildable drives (numberOfDrives) of the rebuildable resources (rebuildableResources) for storage nodes to be recovered cannot be referenced (for example, when the status of the storage node is "PersistentBlockage"). If you need estimation in such a case, see the number of times Rebuild can be performed (numberOfDrives) in the storage pool information.

Fast Rebuild

Data capacity updated by write I/O to the drive when the storage node is blocked. The capacity varies depending on write I/O load conditions. Write I/O to the drive might occur asynchronously with I/O from the compute node.

However, even in areas having no write I/O, the processing time is approximately three seconds per 100 GiB.

Tip:

For how to verify whether there is write I/O to the drive, see Obtaining a list of low-resolution performance information about drives or Obtaining a list of high-resolution performance information about drives.

Estimate the unit processing time [min] for Rebuild as follows based on the storage pool usage (usedCapacityRate [%]). The formula depends on whether user data (used areas) or no user data (unused areas) exists on the drive.

Rebuild type

Unit processing time for Rebuild [min]

Normal Rebuild

Usage of the storage pool [%] / 100 × unit processing time for Rebuild of used areas [min] + (100% - usage of the storage pool [%])] / 100 × unit processing time for Rebuild of unused areas [min]

Fast Rebuild

Unit processing time for Rebuild of used areas [min]

The unit processing time for Rebuild of used areas and unused areas varies depending on the user data protection method (RedundantType) settings and the resource usage rate of the internal processing I/O (asyncProcessingResourceUsageRate).

Note that the AsyncProcessingResourceUsageRate setting defines the resource utilization for internal processing I/O (rebuild, drive data relocation). To change the settings, see Changing usage of internal processing I/O resources.

redundantType

asyncProcessingResourceUsageRate

Unit processing time for Rebuild of used areas [min]

Unit processing time for Rebuild of unused areas [min]

4D+1P

VeryHigh

30

8

High

60

12

Middle

80

15

Low

200

30

4D+2P

VeryHigh

45

10

High

90

15

Middle

120

20

Low

210

30

Duplication

VeryHigh

15

8

High

25

12

Middle

35

15

Low

160

80

The Rebuild processing time [min] is based on measurement results in a configuration that meets the following prerequisites:

  • Bare metal model

  • With I/O load

  • Number of storage nodes: 6

  • Number of installed user data drives per storage node: 8

  • User data drive to be used: SAS SSD

  • When asyncProcessingResourceUsageRate is High, Middle, or Low: The internode network bandwidth is 10 Gbps.

    When asyncProcessingResourceUsageRate is VeryHigh: The internode network bandwidth is 25 Gbps.

  • MTU size of the network switch: Set to 9000

  • The usage rate for the maximum logical capacity that can be managed on a storage controller (allocatableCapacityUsageRate [%]) is basically uniform across storage controllers.

The rebuild processing time [min] may increase or decrease according to the following conditions.

  • I/O load

  • Number of installed user data drives per storage node

    If the number of installed user data drives per storage node is less than eight, the Rebuild might take longer due to the increased I/O load per user data drive.

    If the number of installed user data drives is nine or more, the rebuild processing time might be shorter than the estimated time due to the reduced I/O load per user data drive.

  • Number of storage nodes in the case of HPEC

    If the number of storage nodes is less than six, the rebuild processing time might be longer than the estimated time due to the increased I/O load per storage node.

    If the number of storage nodes is seven or more, the rebuild processing time might be shorter than the estimated time due to the reduced I/O load per storage node.

  • Status among storage controllers for allocatableCapacityUsageRate [%]

    If allocatableCapacityUsageRate [%] (usage rate against the maximum logical capacity that can be managed on the storage controller) is unbalanced between storage controllers, the rebuild processing time might differ from the estimated time.

  • Immediately after the operation that caused storage pool usage reduction or immediately after internal processing is performed *

    The rebuild processing time might become longer than the estimated time immediately after the operation that caused storage pool usage reduction or immediately after internal processing is performed (due to used areas remaining on the drive).

    * The storage pool usage rate will be reduced when the following operations or internal processing is performed.

    • Deleting volumes

    • Deleting snapshots

    • Adding storage nodes

    • Capacity balance

    • Garbage collection by the data reduction function

    • I/O by the UNMAP command

    • I/O by the WRITE SAME command

Note:
  • When performing a storage node maintenance recovery or replacement for multiple storage nodes, so the rebuild takes as long as (the rebuild processing time per storage node [min]) × (number of storage nodes to be recovered or replaced).

  • If Rebuild is triggered by a drive failure, the Rebuild might take longer than the estimated time [min] due to change of data layout among drives. To verify the time required to complete the Rebuild, confirm the following event log KARS07003-I.

CAUTION:

If physical capacity of a storage node exceeds the capacity that can be allocated to a storage controller, the calculated approximate processing time might be longer. For details about the maximum capacity that can be allocated to a storage controller, contact customer support

When Rebuild is being performed, you can use event log KARS07003-I to confirm the approximate remaining time [min] to complete the Rebuild. However, the time required until the Rebuild is completed might vary depending on the following conditions:

  • Network switch performance and I/O load

  • When Rebuild was suspended due to failures, maintenance operation, or other causes, and then the Rebuild was performed again

  • When the storage cluster was shut down during Rebuild

  • When Fast Rebuild changed to Normal Rebuild

For how to obtain a list of event logs, see Obtaining a list of event logs.

Note:
  • The progress rate and remaining time displayed in KARS07003-I are shown in the following information.

    • Progress rate [%]: The progress rate of the rebuild process for the entire data to be rebuilt. If all rebuilt data is successfully processed, it will be 100 [%]. If there is data that failed to rebuild, it will not be 100 [%].

    • Remaining time [min]: The approximate time before the rebuild process ends (completes or suspends). 0 [min] even if there is data that failed to rebuild.

  • If any data fails in the rebuild process, the rebuild process is terminated once. After that, the rebuild process will be reexecuted and will try to rebuild the failed data.

  • Event log KARS07003-I is output when the progress rate fluctuates by 10 points or more*, or the progress rate becomes 100%.

    * The progress rate might move backward due to the internal processing or storage system status. The remaining time displayed when the progress rate is moving backward might be longer than the actual remaining time.

  • The remaining time to complete the Rebuild displayed in KARS07003-I tends to be less accurate during periods of low progress. As the rate of progress proceeds, the time is gradually corrected to match actual conditions.