Use the following procedure to verify that storage node maintenance blocking can be performed.
-
Performing storage node maintenance blocking lowers the degree of redundancy for the elements that are made redundant, such as the user data, storage controller, or cluster master node. This means that fault tolerance is degraded until the storage node blocked for maintenance is recovered. Therefore, perform only the minimum necessary scope of maintenance blocking.
-
Performing maintenance blocking of cluster master nodes might cause an error pop-up to be displayed in the VSP One SDS Block Administrator window or a connection error (such as login failure) to occur. Wait for a while (a maximum of 60 minutes), and then log in again. For details about how to verify that the connection destination is the cluster master node (primary), see Verifying the cluster master node (primary) in the VSP One SDS Block System Administrator Operation Guide.
-
Even after the following confirmation procedure has been performed, maintenance blocking might be unsuccessful depending on various conditions. In this case, see the output event logs, and then take appropriate action.
-
This procedure partially differs from that of Verifying the conditions for performing storage node maintenance blocking in the VSP One SDS Block System Administrator Operation Guide (because the VSP One SDS Block Administrator cannot display the spread placement group information for the cloud model or obtain it when storage node maintenance blocking is performed). When you use the VSP One SDS Block Administrator for storage node maintenance blocking, perform the following procedure.
-
Required role: Service
-
Verify the ID and STATUS of the storage node in the Storage Nodes list tab (in
the Storage Nodes window) or Storage Node detailed information window.
If STATUS of the storage node to be blocked for maintenance is "Ready" or "RemovalFailed", go to the next step.
If there are storage nodes in any other statuses, take action as follows:
-
If STATUS is "TemporaryBlockage", "MaintenanceBlockage", "PersistentBlockage", "InstallationFailed", "RemovalFailedAndTemporaryBlockage", "RemovalFailedAndMaintenanceBlockage", or "RemovalFailedAndPersistentBlockage", the storage node is blocked properly and separation from the storage cluster is completed. Therefore, you do not have to perform maintenance blocking.
Because of this, if you need to perform maintenance blocking again, you must recover the target storage node first.
-
If the VSP One SDS Block Administrator displays "Alerting" for "Health status" of a storage node, contact customer support.
However, "Alerting" is also displayed for "Health status" when only a storage node with the status "RemovalFailed" exists. In this case, no action is required.
-
If there is a storage node in any status other than those mentioned earlier, a process is being performed for such a storage node. Wait until STATUS changes, and then verify STATUS again.
-
-
Click the information icon on the navigation bar, and then select
Storage Cluster Information.
Obtain information about the storage cluster to verify the state of the write back mode with cache protection.
Take the following action according to the state of the write back mode with cache protection (WRITE BACK MODE WITH CACHE PROTECTION).
State of the write back mode with cache protection
Action to be taken
Enabled
Go to the next step.
Disabled
Go to step 4.
Enabling
Enable the write back mode with cache protection according to Enabling the write back mode with cache protection in the VSP One SDS Block System Administrator Operation Guide, and then go to the next step.
Or, cancel the enabling the write back mode with cache protection, and then go to step 4.
Disabling
Disable the write back mode with cache protection according to Disabling the write back mode with cache protection in the VSP One SDS Block System Administrator Operation Guide, and then go step 4.
Or, cancel the disabling the write back mode with cache protection, and then go to the next step.
-
In the Storage Cluster Information dialog, verify the summary of metadata
redundancy for cache protection in write back mode with cache protection.
Confirm the user data protection type (REDUNDANT POLICY) in the Storage Pool window.
Verify whether the summary of metadata redundancy for cache protection in the write back mode with cache protection (METADATA REDUNDANCY SUMMARY) meets the conditions shown in the following table.
User data protection type
(REDUNDANT POLICY)
Condition
(Bare metal) 4D+1P
The value of METADATA REDUNDANCY SUMMARY is 1.
4D+2P
The value of METADATA REDUNDANCY SUMMARY is 1 or 2.
Duplication
The value of METADATA REDUNDANCY SUMMARY is 1.
- If the conditions are met, go to the next step.
- If the conditions are not met, take action as follows.
Then, go to the next step.
- If the VSP One SDS Block Administrator displays "Alerting" for "Health Status" of a storage node, take action as described in If a health status error is detected in the VSP One SDS Block Administrator in the VSP One SDS Block Troubleshooting Guide.
- If there are storage nodes whose STATUS is MaintenanceBlockage in the Storage Nodes list tab (in the Storage Nodes window), perform maintenance recovery for the nodes according to the procedure in Performing maintenance recovery for storage nodes.
- If the KARS06596-E event log is output, take action according to the instruction, and then wait until the metadata redundancy for cache protection is recovered.
CAUTION:If the storage node is blocked, the metadata redundancy for cache protection is not recovered unless the storage node is recovered by maintenance operation. Recover the blocked storage node first by performing maintenance operation.
-
Determine whether Rebuild operation is being performed by confirming the
following statuses in the Protection Domain window.
REBUILD STATUS indicates the operation status of the Rebuild operation, and REBUILD PROGRESS RATE indicates the progress rate of the Rebuild operation.
-
REBUILD STATUS
-
Stopped: Status in which Rebuild operation is not being performed.
-
Running: Status in which Rebuild operation is being performed. The Rebuild operation cannot be stopped. Wait until the Rebuild operation is completed, and then verify the Rebuild status in the Protection Domain window again.
-
Error: Status in which Rebuild processing cannot be performed due to an error. Verify the event logs and perform troubleshooting.
-
-
REBUILD PROGRESS RATE
Shows the progress rate (%) of Rebuild operation. The progress rate is updated when it fluctuates by 1 point or more. (When progress is made in a short period of time, such as when Fast Rebuild is performed, the progress rate might be updated by several points rather than by 1 point.)
If the Rebuild operation is not being performed, go to the next step.
If the Rebuild operation is being performed, wait until the operation completes. Then, go to the next step.
However, if the storage node to be blocked for maintenance was recovered immediately on a last occasion, you can go to the next step without waiting.
-
-
(Bare metal) Verify that the Drive data relocation
operation is not being performed by confirming the following statuses in the
Protection Domain window.
DRIVE DATA RELOCATION STATUS shows the Drive data relocation status, and DRIVE DATA RELOCATION PROGRESS RATE shows the progress rate of Drive data relocation processing.
-
DRIVE DATA RELOCATION STATUS
-
Stopped: Status in which Drive data relocation is not being performed.
-
Running: Status in which Drive data relocation is being performed. When you want to suspend the Drive data relocation processing, contact customer support.
-
Error: Status in which Drive data relocation resulted in an error, or other conditions are not satisfied, therefore, Drive data relocation cannot be performed.
-
Suspended: Status in which Drive data relocation processing is suspended, contact customer support.
-
-
DRIVE DATA RELOCATION PROGRESS RATE
Progress rate (%) is displayed whenever data is transferred for relocation.
Note:When Drive data relocation is suspended, the progress rate is reset to 0 when resuming. The progress rate will resume from 0. Only data for which processing did not complete before suspension will be processed again.
In addition, when Drive data relocation is suspended, event log KARS07012-I or KARS07013-I is output.
If Drive data relocation is not being performed, go to the next step.
If Drive data relocation is being performed, wait until the operation completes or interrupt Drive data relocation, and then go to the next step.
-
-
Confirm the user data protection type and user data redundancy in the Storage
Pool window.
Verify whether the user data redundancy (DATA REDUNDANCY) satisfies the conditions shown in the following table.
User data protection type
(REDUNDANT POLICY)
Condition
(Bare metal) 4D+1P
The value of DATA REDUNDANCY is 1.
4D+2P
The value of DATA REDUNDANCY is 2.
Duplication
The value of DATA REDUNDANCY is 1.
-
If the conditions are met, the storage node can be blocked for maintenance, and the confirmation procedure is complete.
However, if a storage node exists that does not contain a drive for which a storage pool has been expanded, even if it satisfies conditions, you need to go to the next step to confirm.
-
If the conditions are not met, go to the next step.
-
-
Verify the status of the failure in the Storage Nodes list tab (in the Storage
Nodes window) and Drives list window.
Verify STATUS of the storage node and drive to see if the applicable conditions are met.
User data protection type
(REDUNDANT POLICY)
Number of fault domains
Condition
(Bare metal) 4D+1P
1
The total number of failure-status storage nodes and failure-status drives is 0.1,2
4D+2P
1
The total number of failure-status storage nodes and failure-status drives is 1 or less.1,2
3
Either of the following must be met:
-
The total number of failure-status storage nodes and failure-status drives is 1 or less.1,2
-
The failure-status storage nodes, failure-status drives, and storage nodes to be blocked for maintenance are all in the same fault domain.1
Duplication
1
Either of the following must be met:
-
The total number of failure-status storage nodes and failure-status drives is 0.1,2
-
The failure-status storage nodes, failure-status drives, and storage nodes to be blocked for maintenance do not span both storage nodes that the redundant storage controllers belong to. Also, the failure-status storage nodes and storage nodes to be blocked for maintenance do not include two or more cluster master nodes in total. 1,3
3
Either of the following must be met:
-
The total number of failure-status storage nodes and failure-status drives is 0.1,2
-
The failure-status storage nodes, failure-status drives, and storage nodes to be blocked for maintenance are all in the same fault domain.1
-
The failure-status storage nodes, failure-status drives, and storage nodes to be blocked for maintenance do not span both storage nodes that the redundant storage controllers belong to. Also, the failure-status storage nodes and storage nodes to be blocked for maintenance do not include two or more cluster master nodes in total. 1,3
Notes:
1. A failure status refers to the following storage nodes and drives.
-
Failure status of storage nodes:
STATUS including the following character strings: "TemporaryBlockage", "MaintenanceBlockage", "TemporaryBlockageFailed", "MaintenanceBlockageFailed", "InstallationFailed", "PersistentBlockage", or "RemovalFailed"
-
Failure status of drives:
"Blockage"
2. The number of failures is counted as one for the following cases:
-
A failure-status drive exists in a failure-status storage node.
-
Multiple failure-status drives exist in the same storage node.
3. You can use the following command to view the information about the storage controllers:
REST API: GET /v1/objects/storage-controllers
CLI: storage_controller_list
If the conditions described in the preceding table are met, the procedure for verifying the conditions for performing storage node maintenance blocking is complete.
If the conditions are not met, take the following action. Taking the following action completes the procedure for verifying the conditions for performing storage node maintenance blocking.
-
If the VSP One SDS Block Administrator displays "Alerting" for "Health status" of a storage node, contact customer support.
-
If there are storage nodes whose STATUS is MaintenanceBlockage, perform maintenance recovery for the nodes according to the procedure in Performing maintenance recovery for storage nodes.
-
In the Protection Domain window, verify the following statuses to verify that Rebuild operation is not being performed and no error occurred during the Rebuild.
If Rebuild operation is being performed or an error occurred during the Rebuild operation, confirm the following statuses, and then take appropriate action.
-
REBUILD STATUS
-
Stopped: Status in which Rebuild operation is not being performed.
-
Running: Status in which Rebuild operation is being performed. The Rebuild operation cannot be stopped. Wait until the Rebuild operation is completed, and then verify the Rebuild status in the Protection Domain window again.
-
Error: Status in which Rebuild processing cannot be performed due to an error. Verify the event logs and perform troubleshooting.
-
-
REBUILD PROGRESS RATE
Shows the progress rate (%) of Rebuild operation. The progress rate is updated when it fluctuates by 1 point or more. (When progress is made in a short period of time, such as when Fast Rebuild is performed, the progress rate might be updated by several points rather than by 1 point.)
-
-