Flow of recovery operations to be performed when a failure occurs at the primary site of a Universal Replicator pair

REST API Reference Guide for Virtual Storage Platform 5000, Virtual Storage Platform E Series, and Virtual Storage Platform G/F Series

Version
93-07-0x
90-09-0x
88-08-10
Audience
anonymous
Part Number
MK-98RD9014-17
If a failure occurs at the primary site of a Universal Replicator pair, you can use the REST API to perform a failover to the secondary site, in order to ensure continuous operation. After the recovery of the primary site is complete, you can return the pair to the state it was in before the failure occurred by switching operations back to the primary site from the secondary site.
The operations for recovering the Universal Replicator pair can be divided into three general phases:
  1. Perform a failover to switch operations over to the secondary site.
  2. Copy data from the secondary site to the primary site.
  3. Return the pair relationship between the primary site and the secondary site to the state it was in before the failure occurred.
The following explains the flow of operations in each phase.

Performing a failover to switch over business operations to the secondary site

After a failure is detected at the primary site, switch the roles of the primary volume and the secondary volume of the Universal Replicator pair, so that data can be written to the secondary volume, and business operations can continue at the secondary site.

Note: If a failure has occurred at the primary site, pair information of the primary site cannot be obtained. Therefore, to identify the volume at the secondary site to which business operations are to be switched over, you need to know in advance which storage system makes up a pair together with the primary volume of the Universal Replicator pair at the primary site.

The following figure shows the flow of operations:

Stop the business system

When a failure is detected at the primary site, stop the business system, and make sure that there is no I/O to or from the hosts.

Get copy group information or pair information

Get a list of the copy groups on the storage system of the secondary site. Then, based on this information, get copy pair information for the secondary site. When executing these API requests, you need to specify remote storage system information for the query parameter or object ID. In this situation, specify NotSpecified.

Identify the pair to which business operations are to be switched over

Based on the pair information for the secondary site, identify the pair to which business operations are to be switched over.

Switch over business operations to a volume at the secondary site

Specify the pair or copy group, and then switch the roles of the primary volume and the secondary volume. Data can now be written to the secondary volume.

Note: When auto is specified as the takeover execution mode, the storage systems at the secondary site automatically try to resynchronize with the storage systems at the primary site. If the resynchronization succeeds, you do not need to split and then resynchronize the pair by performing the following flow of operations (the flow of operations for copying data from the secondary site to the primary site). To check whether the resynchronization was successful, get pair information.
Restart the business system (at the secondary site)

Restart the operations of the business system at the secondary site.

Copying data from the secondary site to the primary site

After recovery is complete for the primary site, apply the data that was written to the secondary site during the failure to the primary site. The following figure shows the flow of operations:

Stop the business system

Stop the business system, and make sure that there is no I/O to or from the hosts.

Get copy group information or pair information

Get copy pair information based on the copy group information, and then check the pair status.

Split or delete the pair

Perform one of the following operations if necessary, according to the pair status:

  • If the pair status of the S-VOL is SSWS, split the pair.
  • If the pair status of the P-VOL or the S-VOL is SMPL, delete the pair.
Resynchronize or re-create the pair

Perform one of the following operations if necessary, according to the pair status:

  • If the pair status of the S-VOL is SSWS, resynchronize the pair at the secondary site (the S-VOL). At this time, specify true for doSwapSvol.
  • If the pair status of both the primary site and the secondary site is SMPL, create a pair by specifying the P-VOL for the secondary site.

Returning the pair relationship between the primary site and the secondary site to the state it was in before the failure

When all pair statuses are PAIR and all data on the secondary site is applied to the primary site, normal operation can be restarted at the primary site. The following figure shows the flow of operations:

Get copy group information or pair information

Get pair information based on copy group information, and make sure that the status of the target pair is PAIR.

Split the pair

Split the pair.

Resynchronize the pair

Resynchronize the pair at the primary site (P-VOL). Specify true for doSwapSvol.

Get copy group information or pair information

Get pair information based on copy group information, and make sure that the status of the target pair is PAIR.

The pair relationship between the primary site and the secondary site and the copy direction are returned to the state they were in before the failure, and the business system can now be restarted.