The estimated time [min] before completion of Rebuild can be calculated as follows:
Data capacity to be rebuilt [TiB] × unit processing time for Rebuild [min]
The following table shows the data capacities to be rebuilt.
Category |
Data capacities to be rebuilt |
|
---|---|---|
Normal Rebuild |
For a drive failure |
Physical capacity of the drive in which a failure occurred |
For storage node maintenance recovery |
Here, the rebuild capacity is derived as follows:
Sometimes, the number of rebuildable drives (numberOfDrives) of the rebuildable resources (rebuildableResources) for storage nodes to be recovered cannot be referenced (for example, when the status of the storage node is "PersistentBlockage"). If you need estimation in such a case, see the number of times Rebuild can be performed (numberOfDrives) in the storage pool information. |
|
Fast Rebuild |
Data capacity updated by write I/O to the drive when the storage node is blocked. The capacity varies depending on write I/O load conditions. Write I/O to the drive might occur asynchronously with I/O from the compute node. However, even in areas having no write I/O, the processing time is approximately three seconds per 100 GiB. |
For how to verify whether there is write I/O to the drive, see Obtaining a list of low-resolution performance information about drives or Obtaining a list of high-resolution performance information about drives.
Estimate the unit processing time [min] for Rebuild as follows based on the storage pool usage (usedCapacityRate [%]). The formula depends on whether user data (used areas) or no user data (unused areas) exists on the drive.
Rebuild type |
Unit processing time for Rebuild [min] |
---|---|
Normal Rebuild |
Usage of the storage pool [%] / 100 × unit processing time for Rebuild of used areas [min] + (100% - usage of the storage pool [%])] / 100 × unit processing time for Rebuild of unused areas [min] |
Fast Rebuild |
Unit processing time for Rebuild of used areas [min] |
The unit processing time for Rebuild of used areas and unused areas varies depending on the user data protection method (RedundantType) settings and the resource usage rate of the internal processing I/O (asyncProcessingResourceUsageRate).
Note that the AsyncProcessingResourceUsageRate setting defines the resource utilization for internal processing I/O (rebuild, drive data relocation). To change the settings, see Changing usage of internal processing I/O resources.
redundantType |
asyncProcessingResourceUsageRate |
Unit processing time for Rebuild of used areas [min] |
Unit processing time for Rebuild of unused areas [min] |
---|---|---|---|
4D+1P |
VeryHigh |
30 |
8 |
High |
60 |
12 |
|
Middle |
80 |
15 |
|
Low |
200 |
30 |
|
4D+2P |
VeryHigh |
45 |
10 |
High |
90 |
15 |
|
Middle |
120 |
20 |
|
Low |
210 |
30 |
|
Duplication |
VeryHigh |
15 |
8 |
High |
25 |
12 |
|
Middle |
35 |
15 |
|
Low |
160 |
80 |
The Rebuild processing time [min] is based on measurement results in a configuration that meets the following prerequisites:
-
Bare metal model
-
With I/O load
-
Number of storage nodes: 6
-
Number of installed user data drives per storage node: 8
-
User data drive to be used: SAS SSD
-
When asyncProcessingResourceUsageRate is High, Middle, or Low: The internode network bandwidth is 10 Gbps.
When asyncProcessingResourceUsageRate is VeryHigh: The internode network bandwidth is 25 Gbps.
-
MTU size of the network switch: Set to 9000
-
The usage rate for the maximum logical capacity that can be managed on a storage controller (allocatableCapacityUsageRate [%]) is basically uniform across storage controllers.
The rebuild processing time [min] may increase or decrease according to the following conditions.
-
I/O load
-
Number of installed user data drives per storage node
If the number of installed user data drives per storage node is less than eight, the Rebuild might take longer due to the increased I/O load per user data drive.
If the number of installed user data drives is nine or more, the rebuild processing time might be shorter than the estimated time due to the reduced I/O load per user data drive.
-
Number of storage nodes in the case of HPEC
If the number of storage nodes is less than six, the rebuild processing time might be longer than the estimated time due to the increased I/O load per storage node.
If the number of storage nodes is seven or more, the rebuild processing time might be shorter than the estimated time due to the reduced I/O load per storage node.
-
Status among storage controllers for allocatableCapacityUsageRate [%]
If allocatableCapacityUsageRate [%] (usage rate against the maximum logical capacity that can be managed on the storage controller) is unbalanced between storage controllers, the rebuild processing time might differ from the estimated time.
-
Immediately after the operation that caused storage pool usage reduction or immediately after internal processing is performed *
The rebuild processing time might become longer than the estimated time immediately after the operation that caused storage pool usage reduction or immediately after internal processing is performed (due to used areas remaining on the drive).
* The storage pool usage rate will be reduced when the following operations or internal processing is performed.
-
Deleting volumes
-
Deleting snapshots
-
Adding storage nodes
-
Capacity balance
-
Garbage collection by the data reduction function
-
I/O by the UNMAP command
-
I/O by the WRITE SAME command
-
-
When performing a storage node maintenance recovery or replacement for multiple storage nodes, so the rebuild takes as long as (the rebuild processing time per storage node [min]) × (number of storage nodes to be recovered or replaced).
-
If Rebuild is triggered by a drive failure, the Rebuild might take longer than the estimated time [min] due to change of data layout among drives. To verify the time required to complete the Rebuild, confirm the following event log KARS07003-I.
If physical capacity of a storage node exceeds the capacity that can be allocated to a storage controller, the calculated approximate processing time might be longer. For details about the maximum capacity that can be allocated to a storage controller, contact customer support
When Rebuild is being performed, you can use event log KARS07003-I to confirm the approximate remaining time [min] to complete the Rebuild. However, the time required until the Rebuild is completed might vary depending on the following conditions:
-
Network switch performance and I/O load
-
When Rebuild was suspended due to failures, maintenance operation, or other causes, and then the Rebuild was performed again
-
When the storage cluster was shut down during Rebuild
-
When Fast Rebuild changed to Normal Rebuild
For how to obtain a list of event logs, see Obtaining a list of event logs.
-
The progress rate and remaining time displayed in KARS07003-I are shown in the following information.
-
Progress rate [%]: The progress rate of the rebuild process for the entire data to be rebuilt. If all rebuilt data is successfully processed, it will be 100 [%]. If there is data that failed to rebuild, it will not be 100 [%].
-
Remaining time [min]: The approximate time before the rebuild process ends (completes or suspends). 0 [min] even if there is data that failed to rebuild.
-
-
If any data fails in the rebuild process, the rebuild process is terminated once. After that, the rebuild process will be reexecuted and will try to rebuild the failed data.
-
Event log KARS07003-I is output when the progress rate fluctuates by 10 points or more*, or the progress rate becomes 100%.
* The progress rate might move backward due to the internal processing or storage system status. The remaining time displayed when the progress rate is moving backward might be longer than the actual remaining time.
-
The remaining time to complete the Rebuild displayed in KARS07003-I tends to be less accurate during periods of low progress. As the rate of progress proceeds, the time is gradually corrected to match actual conditions.