Replacing drives (Cloud(M))

Virtual Storage Platform One SDS Block Storage Administrator Guide

Version
1.17.x
Audience
anonymous
Part Number
MK-24VSP1SDS002-04

Replace the faulty drive with another drive.

CAUTION:
  • When you perform the following maintenance operations, do not perform multiple operations for the same storage cluster simultaneously.

    • Adding storage nodes
    • Replacing storage nodes
    • Exporting the configuration file
    • Adding drives
    • Replacing drives
  • If you are using EBS encryption, you must add rights to access AWS Key Management Service to an IAM role to be set in the account or controller node for operating AWS. For details, see the AWS user guide.

Configuration files are used for drive replacement. Configuration file export can be performed by logging in to the controller node and then operating a VSP One SDS Block installer similarly with the operation for drive addition.

Notes on the drive auto-recovery function

The cloud model has the drive auto-recovery function which automatically restores a drive that failed due to an EBS volume response delay caused by the cloud platform (AWS).

When the conditions for activating the drive auto-recovery function are met, you do not need to replace the failed drive. Just wait until the drive is restored automatically.

To determine whether you need to wait until the drive is restored automatically or replace the drive, see Action to be taken when “Alerting” is shown in the drive health status in the VSP One SDS Block Troubleshooting Guide.

For the following cases, drives are not automatically restored. You need to perform maintenance for drives as described in this section.

  • The drive failure is not caused by an EBS volume response delay.

    If the drive failure is caused by an EBS volume response delay, a KARS05012-E message is output to the event log.

  • The rebuild capacity allocation status is other than “Sufficient” and you cannot perform the rebuild.

    However, even if the rebuild capacity allocation status is other than “Sufficient”, the rebuild is occasionally possible depending on the drive with a failure. If that is the case, the drive auto-recovery function is also performed after rebuild is complete. For details about the allocation status of the rebuild capacity and how to check the allocation status, see Managing storage pools.

  • The capacity for enabling rebuild is not allocated.

  • A drive failure occurred during automatic recovery of a drive.

The drive auto-recovery function is always enabled and cannot be disabled.

If drive failures due to EBS volume response delays and drive auto-recovery occur repeatedly, and you want to replace a drive that is subject to frequent response delays, see Action to be taken to replace a drive that is subject to frequent response delays (Cloud) in the VSP One SDS Block Troubleshooting Guide and take an appropriate action.

Note:

Data of the drives where Drive auto-recovery was performed has been configured in another drive by Rebuild. In this case, the recovered drive capacity is secured as a free space for Rebuild, and is not accessed unless an event that causes a change of drive data location (such as Rebuild or storage pool expansion) occurs.

  • Required role: Storage

  • The VSP One SDS Block installer must be installed on the controller node intended for use.

Note on the procedure

  • In the following procedure, long command lines begin on a new line delimited by "\". You can copy and paste command lines including "\" for normal operation.

  • The procedure in this section uses AWS CLI to perform operations. However, you can use the AWS CloudFormation console of the AWS Management Console to confirm the CloudFormation stack status or operation status.

  • Recording console outputs (for example, by using script command, redirecting operation, or tee command) can help to confirm the execution result or handle errors.

  1. Log in to the controller node.
  2. Verify the ID of the faulty drive to be replaced and the ID of the storage node containing the faulty drive.

    Also, note down the serial number of the faulty drive to be replaced. This will be used to identify the faulty drive.

    Run either of the following commands with "Blockage" specified for the query parameter "status."

    REST API: GET /v1/objects/drives

    CLI: drive_list

  3. Verify the status of the storage node containing the faulty drive.

    Run either of the following commands with the ID of the storage node containing the faulty drive specified.

    REST API: GET /v1/objects/storage-nodes/<id>

    CLI: storage_node_show

    If the status of the storage node is "Ready" or "RemovalFailed", go to the next step.

  4. Access the AWS Management Console and navigate to the EC2 console.

    In the navigation pane on the left side, click volumes.

  5. Enter the serial number of the faulty drive (with "vol" followed by a hyphen) in the volume search box to verify the drive number to be replaced.

    Note down the drive number that you verified. You will need this number in step 11.

    (Example) If the drive number is 1, it will look like this:

    <cluster-name-specified-for-parameter-at-the-time-of-configuration>_SN02_UserDataDisk01
  6. See Exporting the configuration file (Cloud) to obtain configuration files from VSP One SDS Block for drive replacement.

    To perform configuration file export, verify to specify "ReplaceDrive" for the --mode option. Specify the ID of the faulty drive to be replaced (that you confirmed in step 2) for the --drive_id option. This step is mandatory and ensures that you use the latest configuration files.

    To replace multiple drives, repeat steps 6 to step 21 for each drive.

    CAUTION:

    Do not edit the configuration files you obtained for configuration file export. If you do so, drive replacement might be unsuccessful. If you specified an incorrect value, set the correct value, and then retry configuration file export.

  7. Store the set of configuration files obtained by configuration file export to the Amazon S3 bucket you specified for the --template_s3_url option in step 6.
    One way to store files in Amazon S3 is to copy the file to Amazon S3 by using the AWS CLI, for example. For details about the procedure, see Example Amazon S3 operations in the VSP One SDS Block Cloud Setup and Configuration Guide.
    CAUTION:

    Do not include periods (.) in the name of the Amazon S3 bucket to which VM configuration files are to be stored.

  8. Run the following AWS CLI command to create a change set. If you replace multiple drives, repeat steps 6 to 17.
    aws cloudformation create-change-set \
    --stack-name <stack-name-set-during-installation-from-Marketplace> \
    --change-set-name <any-change-set-name*> \
    --template-url <Amazon-S3-URL(https)-of-VMConfigurationFile.yml> \
    --include-nested-stacks \
    --capabilities CAPABILITY_NAMED_IAM

    *For details about the characters that can be used, see the following website.

    https://docs.aws.amazon.com/cli/latest/reference/cloudformation/create-change-set.html

  9. Run the following AWS CLI command to view the change set (first layer).
    aws cloudformation describe-change-set \
    --stack-name <stack-name-set-during-installation-from-Marketplace> \
    --change-set-name <change-set-name-specified-in-step-8>

    Verify the following.

    • The status must be "CREATE_COMPLETE."

      If the status is "CREATE_IN_PROGRESS", wait for a while, and then retry the operation.

    • The number of items in "Changes" must be one.

    • About "ResourceChange" in "Changes":

      • "Action" must be "Modify."

      • Note down the value of "ChangeSetId" to perform the next step.

  10. Run the following AWS CLI command to view the change set (second layer).
    aws cloudformation describe-change-set \
    --change-set-name <ChangeSetId-recorded-in-step-9>

    Verify the following.

    • About "ResourceChange" for each item in "Changes":

      • "Action" must be "Modify."

      • "LogicalResourceId" must be "StorageNodeXX" (where XX is the number of the storage node containing the faulty drive).

      • Note down the value of "ChangeSetId" for each item to perform the next step.

  11. Run the following AWS CLI command to view the change set of stacks for each storage node (third layer).
    aws cloudformation describe-change-set \
    --change-set-name <ChangeSetId-recorded-in-step-10>

    In the case of the storage node specified for RemoveDriveNodeNumber, verify the following.

    • About "ResourceChange" for each item in "Changes":

      • For the items whose "Action" is "Remove", "LogicalResourceId" must be "VolumeAttachmentXX" or "UserDataDiskXX" (where XX is the number of the specified faulty drive).

    Verify that "Action" is "Modify" for "ResourceChange" of all items in "Changes", other than the preceding ones.

  12. Run the following AWS CLI command to execute the created change set.
    aws cloudformation execute-change-set \
    --stack-name <stack-name-set-during-installation-from-Marketplace> \
    --change-set-name <change-set-name-specified-in-step-8>
  13. Run the following AWS CLI command to verify the execution results of the change set.

    If you run "wait stack-update-complete", you can wait until the change-set execution completes.

    aws cloudformation wait stack-update-complete \
    --stack-name <stack-name-set-during-installation-from-Marketplace>

    After the change-set execution completes, run "describe-stacks" to verify that "StackStatus" is "UPDATE_COMPLETE."

    aws cloudformation describe-stacks \
    --stack-name <stack-name-set-during-installation-from-Marketplace>
  14. See Exporting the configuration file (Cloud) to obtain configuration files from VSP One SDS Block for drive replacement.

    To perform configuration file export, verify to specify "ReplaceDrive" for the --mode option. Specify the --recover_single_drive option. This step is mandatory and ensures that you use the latest configuration files.

    CAUTION:

    Do not edit the configuration files you obtained for configuration file export. If you do so, drive replacement might be unsuccessful. If you specified an incorrect value, set the correct value, and then retry configuration file export.

  15. Store the set of configuration files obtained by configuration file export to the Amazon S3 bucket you specified for the --template_s3_url option in step 14.
    One way to store files in Amazon S3 is to copy the file to Amazon S3 by using the AWS CLI, for example. For details about the procedure, see Example Amazon S3 operations in the VSP One SDS Block Cloud Setup and Configuration Guide.
    CAUTION:

    Do not include periods (.) in the name of the Amazon S3 bucket to which VM configuration files are to be stored.

  16. Run the following AWS CLI command to create a change set.
    aws cloudformation create-change-set \
    --stack-name <stack-name-set-during-installation-from-Marketplace> \
    --change-set-name <any-change-set-name*> \
    --template-url <Amazon-S3-URL(https)-of-VMConfigurationFile.yml> \
    --include-nested-stacks \
    --capabilities CAPABILITY_NAMED_IAM

    *For details about the characters that can be used, see the following website.

    https://docs.aws.amazon.com/cli/latest/reference/cloudformation/create-change-set.html

  17. Run the following AWS CLI command to view the change set (first layer).
    aws cloudformation describe-change-set \
    --stack-name <stack-name-set-during-installation-from-Marketplace> \
    --change-set-name <change-set-name-specified-in-step-16>

    Verify the following.

    • "Status" must be "CREATE_COMPLETE."

      If "Status" is "CREATE_IN_PROGRESS", wait for a while, and then retry the operation.

    • The number of items in "Changes" must be one.

    • About "ResourceChange" in "Changes":

      • "Action" must be "Modify."

      • Note down the value of "ChangeSetId" to perform the next step.

  18. Run the following AWS CLI command to view the change set (second layer).
    aws cloudformation describe-change-set \
    --change-set-name <ChangeSetId-recorded-in-step-17>

    Verify the following.

    • About "ResourceChange" for each item in "Changes":

      • "Action" must be "Modify."

      • "LogicalResourceId" must be "StorageNodeXX" (where XX is the number of the storage node containing the faulty drive).

      • Note down the value of "ChangeSetId" for each item to perform the next step.

  19. Run the following AWS CLI command to view the change set of stacks for each storage node (third layer).
    aws cloudformation describe-change-set \
    --change-set-name <ChangeSetId-recorded-in-step-18>

    Verify the following.

    • About "ResourceChange" for each item in "Changes":

      • For the items whose "Action" is "Add", "LogicalResourceId" must be "VolumeAttachmentXX" or "UserDataDiskXX" (where XX is the number of the specified faulty drive).

      • "Action" for all other items must be "Modify."

  20. Run the following AWS CLI command to execute the created change set.
    aws cloudformation execute-change-set \
    --stack-name <stack-name-set-during-installation-from-Marketplace> \
    --change-set-name <change-set-name-specified-in-step-16>
  21. Run the following AWS CLI command to verify the execution results of the change set.

    If you run "wait stack-update-complete", you can wait until the change-set execution completes.

    aws cloudformation wait stack-update-complete \
    --stack-name <stack-name-set-during-installation-from-Marketplace>

    After the change-set execution completes, run "describe-stacks" to verify that "StackStatus" is "UPDATE_COMPLETE."

    aws cloudformation describe-stacks \
    --stack-name <stack-name-set-during-installation-from-Marketplace>
  22. Obtain a list of drives to verify that a new drive has been added.

    REST API: GET /v1/objects/drives

    CLI: drive_list

    Tip: A drive whose status is "Offline" is the new drive. You can verify the drive status from drive information.
  23. Perform steps 1 to 5 of Expanding storage pools in Adding drives.
  24. Verify the state of the write back mode with cache protection.

    REST API: GET /v1/objects/storage

    CLI: storage_show

    Take the following action according to the state of the write back mode with cache protection (writeBackModeWithCacheProtection).

    • If the state is "Disabled" or "Enabling", go to step 26.

    • If the state is "Enabled" or "Disabling", go to the next step.

  25. See Confirming metadata redundancy for cache protection to verify that metadata redundancy for cache protection is not degraded.

    When the redundancy is not degraded, go to the next step.

    When the redundancy is degraded, wait until it has been recovered. If event log KARS06596-E is output, take action according to the event log. After taking action, perform step 25 again.

    Note:

    If the storage node is blocked, the metadata redundancy for cache protection is not recovered unless the storage node is recovered by maintenance operation. Recover the blocked storage node first by performing maintenance operation.

  26. See Verifying Rebuild status and determine whether the Rebuild is being performed or whether an error has occurred during the Rebuild.

    If the Rebuild is not being performed and no error has occurred, go to the next step. If the Rebuild is being performed or an error has occurred during the Rebuild, take appropriate action (see Verifying Rebuild status).

    Note:

    Before proceeding to the next step, obtain a list of the drives and verify that the faulty drive to be removed exists. If the faulty drive to be removed does not exist, this is the end of procedure.

    For details about how to verify the target faulty drive to be removed, see step 2.

  27. Remove the faulty drive.

    Run either of the following commands with the ID of the faulty drive (confirmed in step 2) specified.

    REST API: POST /v1/objects/drives/<id>/actions/remove/invoke

    CLI: drive_remove

    Verify the job ID that is displayed after the command is run.

  28. Verify the state of the job.

    Run either of the following commands with the job ID specified.

    REST API: GET /v1/objects/jobs/<jobId >

    CLI: job_show

    After running the command, if you receive a response indicating "Succeeded" as the state, the job is completed.

  29. Obtain a list of drives and verify that the faulty drive has been removed.

    After step 27, removal of the drive might take approximately one minute.

    REST API: GET /v1/objects/drives

    CLI: drive_list