Replace the faulty drive with another drive.
CAUTION:
-
When you perform the following maintenance operations, do not
perform multiple operations for the same storage cluster
simultaneously.
- Adding storage nodes
- Replacing storage nodes
- Exporting the configuration file
- Adding drives
- Replacing drives
-
If you are using EBS encryption, you must add rights to
access AWS Key Management Service to an IAM role to be set in the
account or controller node for operating AWS. For details, see the AWS
user guide.
Configuration files are used for drive replacement. Configuration file
export can be performed by logging in to the controller node and then operating a
VSP One SDS Block installer similarly with the operation for drive addition.
Notes on the drive auto-recovery
function
The cloud model has the drive auto-recovery function which automatically restores
a drive that failed due to an EBS volume response delay caused by the cloud platform
(AWS).
When the conditions for activating the drive auto-recovery function are
met, you do not need to replace the failed drive. Just wait until the drive is
restored automatically.
To determine whether you need to wait until the drive is restored
automatically or replace the drive, see Action to be taken when
“Alerting” is shown in the drive health status in the VSP One SDS Block Troubleshooting Guide.
For the following cases, drives are not automatically restored. You need
to perform maintenance for drives as described in this section.
-
The drive failure is not caused by an EBS volume response
delay.
If the drive failure is caused by an EBS volume response delay, a
KARS05012-E message is output to the event log.
-
The rebuild capacity allocation status is other than “Sufficient”
and you cannot perform the rebuild.
However, even if the rebuild capacity allocation status is other
than “Sufficient”, the rebuild is occasionally possible
depending on the drive with a failure. If that is the case, the drive
auto-recovery function is also performed after rebuild is complete. For
details about the allocation status of the rebuild capacity and how to check
the allocation status, see Managing storage
pools.
-
The capacity for enabling rebuild is not allocated.
-
A drive failure occurred during automatic recovery of a
drive.
The drive auto-recovery function is always enabled and cannot be
disabled.
If drive failures due to EBS volume response delays and drive
auto-recovery occur repeatedly, and you want to replace a drive that is subject to
frequent response delays, see Action to be taken to replace a
drive that is subject to frequent response delays (Cloud) in the VSP One SDS Block Troubleshooting Guide and
take an appropriate action.
Note:
Data of the drives where Drive auto-recovery was performed has been
configured in another drive by Rebuild. In this case, the recovered drive
capacity is secured as a free space for Rebuild, and is not accessed unless an
event that causes a change of drive data location (such as Rebuild or storage
pool expansion) occurs.
Note on the procedure
-
In the following procedure, long command lines begin on a new
line delimited by "\". You can copy and paste command lines including "\"
for normal operation.
-
The procedure in this section uses AWS CLI to perform operations.
However, you can use the AWS CloudFormation console of the AWS Management
Console to confirm the CloudFormation stack status or operation status.
-
Recording console outputs (for example, by using script command,
redirecting operation, or tee command) can help to confirm the execution
result or handle errors.
-
Log in to the controller node.
-
Verify the ID of the faulty drive to be replaced and the ID of the storage node
containing the faulty drive.
Also, note down the serial number of the faulty drive to be
replaced. This will be used to identify the faulty drive.
Run either of the following commands with "Blockage" specified
for the query parameter "status."
REST API: GET /v1/objects/drives
CLI: drive_list
-
Verify the status of the storage node containing the faulty drive.
Run either of the following commands with the ID of the storage
node containing the faulty drive specified.
REST API: GET /v1/objects/storage-nodes/<id>
CLI: storage_node_show
If the status of the storage node is "Ready" or "RemovalFailed",
go to the next step.
-
Access the AWS Management Console and navigate to the EC2 console.
In the navigation pane on the left side, click volumes.
-
Enter the serial number of the faulty drive (with "vol" followed by a hyphen)
in the volume search box to verify the drive number to be replaced.
Note down the drive number that you verified. You will need this
number in step 11.
(Example) If the drive number is 1, it will look like this:
<cluster-name-specified-for-parameter-at-the-time-of-configuration>_SN02_UserDataDisk01
-
See Exporting the configuration file (Cloud) to obtain
configuration files from VSP One SDS Block for drive
replacement.
To perform configuration file export, verify to specify
"ReplaceDrive" for the --mode option. Specify the ID of the faulty drive to
be replaced (that you confirmed in step 2) for the --drive_id option. This
step is mandatory and ensures that you use the latest configuration
files.
To replace multiple drives, repeat steps 6 to step 21 for each
drive.
CAUTION:
Do not edit the configuration files you obtained for
configuration file
export. If you do
so, drive replacement might be unsuccessful. If you specified an
incorrect value, set the correct value, and then retry configuration
file export.
-
Store the set of configuration files obtained
by configuration file export to the Amazon S3 bucket you specified for the
--template_s3_url option in step 6.
One way to store files in Amazon S3 is to copy the file
to Amazon S3 by using the AWS CLI, for example. For details about the procedure,
see
Example Amazon S3 operations in the
VSP One SDS Block Cloud Setup and Configuration Guide.
CAUTION:
Do not include periods (.) in the name of the Amazon S3
bucket to which VM configuration files are to be stored.
-
Run the following AWS CLI command to create a change set. If you replace
multiple drives, repeat steps 6 to 17.
aws cloudformation create-change-set \
--stack-name <stack-name-set-during-installation-from-Marketplace> \
--change-set-name <any-change-set-name*> \
--template-url <Amazon-S3-URL(https)-of-VMConfigurationFile.yml> \
--include-nested-stacks \
--capabilities CAPABILITY_NAMED_IAM
*For details about the characters that can be used, see the
following website.
https://docs.aws.amazon.com/cli/latest/reference/cloudformation/create-change-set.html
-
Run the following AWS CLI command to view the change set (first layer).
aws cloudformation describe-change-set \
--stack-name <stack-name-set-during-installation-from-Marketplace> \
--change-set-name <change-set-name-specified-in-step-8>
Verify the following.
-
The status must be "CREATE_COMPLETE."
If the status is "CREATE_IN_PROGRESS", wait for a while,
and then retry the operation.
-
The number of items in "Changes" must be one.
-
About "ResourceChange" in "Changes":
-
Run the following AWS CLI command to view the change set (second layer).
aws cloudformation describe-change-set \
--change-set-name <ChangeSetId-recorded-in-step-9>
Verify the following.
-
Run the following AWS CLI command to view the change set of stacks for each
storage node (third layer).
aws cloudformation describe-change-set \
--change-set-name <ChangeSetId-recorded-in-step-10>
In the case of the storage node specified for
RemoveDriveNodeNumber, verify the following.
Verify that "Action" is
"Modify" for "ResourceChange" of all items in "Changes", other than the
preceding ones.
-
Run the following AWS CLI command to execute the created change set.
aws cloudformation execute-change-set \
--stack-name <stack-name-set-during-installation-from-Marketplace> \
--change-set-name <change-set-name-specified-in-step-8>
-
Run the following AWS CLI command to verify the execution results of the change
set.
If you run "wait stack-update-complete", you can wait until the
change-set execution completes.
aws cloudformation wait stack-update-complete \
--stack-name <stack-name-set-during-installation-from-Marketplace>
After the change-set execution completes, run "describe-stacks"
to verify that "StackStatus" is "UPDATE_COMPLETE."
aws cloudformation describe-stacks \
--stack-name <stack-name-set-during-installation-from-Marketplace>
-
See Exporting the configuration file (Cloud) to obtain
configuration files from VSP One SDS Block for drive
replacement.
To perform configuration file export, verify to specify
"ReplaceDrive" for the --mode option. Specify the --recover_single_drive
option. This step is mandatory and ensures that you use the latest
configuration files.
CAUTION:
Do not edit the configuration files you obtained for
configuration file export. If you do so, drive replacement might be
unsuccessful. If you specified an incorrect value, set the correct
value, and then retry configuration file export.
-
Store the set of configuration files obtained
by configuration file export to the Amazon S3 bucket you specified for the
--template_s3_url option in step 14.
One way to store files in Amazon S3 is to copy the file to Amazon
S3 by using the AWS CLI, for example. For details about the procedure, see
Example Amazon S3 operations in the
VSP One SDS Block Cloud Setup and Configuration Guide.
CAUTION:
Do not include periods (.) in the name of the Amazon S3
bucket to which VM configuration files are to be stored.
-
Run the following AWS CLI command to create a change set.
aws cloudformation create-change-set \
--stack-name <stack-name-set-during-installation-from-Marketplace> \
--change-set-name <any-change-set-name*> \
--template-url <Amazon-S3-URL(https)-of-VMConfigurationFile.yml> \
--include-nested-stacks \
--capabilities CAPABILITY_NAMED_IAM
*For details about the characters that can be used, see the
following website.
https://docs.aws.amazon.com/cli/latest/reference/cloudformation/create-change-set.html
-
Run the following AWS CLI command to view the change set (first layer).
aws cloudformation describe-change-set \
--stack-name <stack-name-set-during-installation-from-Marketplace> \
--change-set-name <change-set-name-specified-in-step-16>
Verify the following.
-
"Status" must be "CREATE_COMPLETE."
If "Status" is "CREATE_IN_PROGRESS", wait for a while,
and then retry the operation.
-
The number of items in "Changes" must be one.
-
About "ResourceChange" in "Changes":
-
Run the following AWS CLI command to view the change set (second layer).
aws cloudformation describe-change-set \
--change-set-name <ChangeSetId-recorded-in-step-17>
Verify the following.
-
Run the following AWS CLI command to view the change set of stacks for each
storage node (third layer).
aws cloudformation describe-change-set \
--change-set-name <ChangeSetId-recorded-in-step-18>
Verify the following.
-
Run the following AWS CLI command to execute the created change set.
aws cloudformation execute-change-set \
--stack-name <stack-name-set-during-installation-from-Marketplace> \
--change-set-name <change-set-name-specified-in-step-16>
-
Run the following AWS CLI command to verify the execution results of the change
set.
If you run "wait stack-update-complete", you can wait until the
change-set execution completes.
aws cloudformation wait stack-update-complete \
--stack-name <stack-name-set-during-installation-from-Marketplace>
After the change-set execution completes, run "describe-stacks"
to verify that "StackStatus" is "UPDATE_COMPLETE."
aws cloudformation describe-stacks \
--stack-name <stack-name-set-during-installation-from-Marketplace>
-
Obtain a list of drives to verify that a new
drive has been added.
REST API: GET /v1/objects/drives
CLI: drive_list
Tip: A drive whose status is "Offline"
is the new drive. You can verify the drive status from drive
information.
-
Perform steps 1 to 5 of Expanding storage pools in
Adding drives.
-
Verify the state of the write back mode with cache protection.
REST API: GET /v1/objects/storage
CLI: storage_show
Take the following action according to the state of the write
back mode with cache protection (writeBackModeWithCacheProtection).
-
If the state is "Disabled" or "Enabling", go to step 26.
-
If the state is "Enabled" or "Disabling", go to the next
step.
-
See Confirming metadata redundancy for cache
protection to verify that metadata redundancy for cache protection is
not degraded.
When the redundancy is not degraded, go to the next step.
When the redundancy is degraded, wait until it has been
recovered. If event log KARS06596-E is output, take action according to the
event log. After taking action, perform step 25 again.
Note:
If the storage node is blocked, the metadata redundancy for
cache protection is not recovered unless the storage node is recovered
by maintenance operation. Recover the blocked storage node first by
performing maintenance operation.
-
See Verifying Rebuild status and determine whether the
Rebuild is being performed or whether an error has occurred during the
Rebuild.
If the Rebuild is not being performed and no error has occurred,
go to the next step. If the Rebuild is being performed or an error has
occurred during the Rebuild, take appropriate action (see Verifying Rebuild status).
Note:
Before proceeding to the next step, obtain a list of the
drives and verify that the faulty drive to be removed exists. If the
faulty drive to be removed does not exist, this is the end of
procedure.
For details about how to verify the target faulty drive to be
removed, see step 2.
-
Remove the faulty drive.
Run either of the following commands with the ID of the faulty
drive (confirmed in step 2) specified.
REST API: POST /v1/objects/drives/<id>/actions/remove/invoke
CLI: drive_remove
Verify the job ID that is displayed after the command is
run.
-
Verify the state of the job.
Run either of the following commands with the job ID
specified.
REST API: GET /v1/objects/jobs/<jobId >
CLI: job_show
After running the command, if you receive a response indicating
"Succeeded" as the state, the job is completed.
-
Obtain a list of drives and verify that the faulty drive has
been removed.
After step 27, removal of the drive might take approximately one
minute.
REST API: GET /v1/objects/drives
CLI: drive_list