Replacing drives (Bare metal)

Virtual Storage Platform One SDS Block Storage Administrator Guide

Version
1.17.x
Audience
anonymous
Part Number
MK-24VSP1SDS002-04

Replace the faulty drive with another drive.

Notes on the drive auto-recovery function

The bare metal model provides the drive auto-recovery function that recovers drives automatically from drive failure caused by drive response delays. When operation conditions for the drive auto-recovery function are met, you do not need to replace a drive. Wait until drive auto recovery completes.

For details about how to judge whether to wait for drive auto recovery or to replace a drive, see Action to be taken when "Alerting" is shown in the drive health status in the VSP One SDS Block Troubleshooting Guide.

Drive auto recovery will not work in the following cases. Perform drive replacement as described in this section.

  • When the drive failure is not caused by drive response delays

    When the drive failure is caused by drive response delays, event log KARS05012-E is output.

  • When the rebuild capacity allocation status is other than "Sufficient" and Rebuild is inoperable

    However, even if the rebuild capacity allocation status is other than "Sufficient", Rebuild might be operable in some faulty drives. In such a case, drive auto recovery also operates after Rebuild completes. For details about the rebuild capacity allocation status and how to verify the status, see Managing storage pools.

  • When the allocated rebuildable capacity is not sufficient
  • When the drive failure occurred during drive auto recovery
  • When the drive cannot be returned to the state in which the metadata redundancy for cache protection is not degraded

The drive auto-recovery function is always enabled and cannot be disabled.

If you want to replace a drive that is subject to frequent response delays (for example, repeated drive failure occurs due to drive response delays and drive auto recovery), see Action to be taken to replace a drive that is subject to frequent delays in the VSP One SDS Block Troubleshooting Guide, and then take action.

Note:

Data of the drives where drive auto-recovery was performed has been configured in another drive by Rebuild. In this case, the recovered drive capacity is secured as a free space for Rebuild, and is not accessed unless an event that causes a change of drive data location (such as Rebuild or storage pool expansion) occurs.

  • Required role: Storage

  1. Verify the ID of the faulty drive to be removed and the ID of the storage node that has the faulty drive.

    Also record the WWID of the faulty drive to be removed. WWID is used to remove the faulty drive from the server.

    Run either of the following commands with "Blockage" specified for the query parameter "status."

    REST API: GET /v1/objects/drives

    CLI: drive_list

  2. Verify the status of the storage node containing the faulty drive.

    Run either of the following commands with the storage node ID containing the faulty drive specified.

    REST API: GET /v1/objects/storage-nodes/<id>

    CLI: storage_node_show

    Go to the next step when the status of the storage node is "Ready" or "RemovalFailed."

  3. Turn on the locator LED of the drive to be removed.

    Run either of the following commands with "TurnOn" specified for the operationType parameter (operation_type in the case of CLI).

    REST API: POST /v1/objects/drives/<id>/actions/control-locator-led/invoke

    CLI: drive_control_locator_led

    Verify the job ID which is displayed after the command is run.

    CAUTION:

    If a storage node failure occurs during a drive replacement operation, the locator LED on/off status shown by using the REST API, CLI, or VSP One SDS Block Administrator might become different from the on/off status of the locator LED on the physical drive. The locator LED on/off status shown by using the REST API, CLI, or VSP One SDS Block Administrator is updated and corrected after the storage node is recovered from the failure.

    Note:

    If the configuration differs from those described in VSP One SDS Block Hardware Compatibility Reference, locator LED operation might not be available. In this case, confirm the drive location by performing the procedure indicated in Note in step 5.

  4. Verify the state of the job.

    Run either of the following commands with the job ID specified.

    REST API: GET /v1/objects/jobs/<jobID>

    CLI: job_show

    If the job state is "Succeeded", the job is completed.

  5. On the server, find the drive whose locator LED is lit and confirm its mounting position.

    Then, remove the faulty drive from the server.

    For details, see the documentation of your server vendor.

    CAUTION:

    If a failure occurs during a drive replacement operation, the locator LED might be turned off. In such a case, resume from step 3.

    If you interrupt a drive replacement operation and perform a maintenance operation that requires the storage node to be restarted, the locator LED might be turned off. In such a case, resume from step 3.

    Note:
    • If the locator LED is not lit, confirm the mounting position of the drive to be removed using the following method.

      Find the drive that matches the WWID of the failed drive recorded in step 1 and the WWN or EUI value of the drive recorded at the time of expansion. Confirm the location in which the drive recorded in association with the WWN or EUI is installed.

    • If the value recorded at the time of drive addition was a WWN, there might be a difference in the last 1 to 3 digits of the right-side 16-digit part of the WWID recorded in step 1.

  6. Perform the steps from Inserting drives (Bare metal) in Adding drives to step 5 of Expanding storage pools.

    Note that the drive you are adding should be a new drive, not the failed drive that you removed in step 5.

    CAUTION:

    If you are replacing multiple drives at the same time, perform steps 1 through 6 (physical drive reduction and expansion operations) one drive at a time. When you have completed step 6 on all drives, complete step 7 or later.

  7. Verify the state of the write back mode with cache protection.

    REST API: GET /v1/objects/storage

    CLI: storage_show

    Take the following action according to the state of the write back mode with cache protection (writeBackModeWithCacheProtection).

    • If the state is "Disabled" or "Enabling", go to step 9.

    • If the state is "Enabled" or "Disabling", go to the next step.

  8. See Confirming metadata redundancy for cache protection to verify that metadata redundancy for cache protection is not degraded.

    When the redundancy is not degraded, go to the next step.

    When the redundancy is degraded, wait until it has been recovered. If event log KARS06596-E is output, take action according to the event log. After taking action, perform step 8 again.

    Note:

    If the storage node is blocked, the metadata redundancy for cache protection is not recovered unless the storage node is recovered by maintenance operation. Recover the blocked storage node first by performing maintenance operation.

  9. See Verifying Rebuild status and determine whether the Rebuild is being performed or whether an error has occurred during the Rebuild.

    If the Rebuild is not being performed and no error has occurred, go to the next step.

    If the Rebuild is being performed or an error has occurred during the Rebuild, take appropriate action (see Verifying Rebuild status).

    Note:

    Before proceeding to the next step, obtain a list of the drives and verify that the target faulty drive exists. If the target faulty drive does not exist, go to step 13. For how to verify the target faulty drive, see step 1.

  10. Remove the faulty drive.

    Run either of the following commands with the faulty drive ID obtained in step 1 specified.

    REST API: POST /v1/objects/drives/<id>/actions/remove/invoke

    CLI: drive_remove

    Verify the job ID which is displayed after the command is run.

  11. Verify the state of the job.

    Run either of the following commands with the job ID specified.

    REST API: GET /v1/objects/jobs/<jobId>

    CLI: job_show

    If the job state is "Succeeded", the job is completed.

  12. Obtain a list of drives and verify that the target faulty drive has been removed.

    After step 10, removal of the drive might take approximately one minute.

    REST API: GET /v1/objects/drives

    CLI: drive_list

  13. Back up the configuration information.

    Perform this step by referring to Backing up the configuration information (Bare metal).

    If you continue operations with other procedures, you must back up the configuration information after you have completed all operations.