Settings and functions for fault tolerance supported by VSP One SDS Block

Virtual Storage Platform One SDS Block Cloud Setup and Configuration Guide

Version
1.14.x
Audience
anonymous
Part Number
MK-24VSP1SDS008-01

VSP One SDS Block supports the following functions for fault tolerance. The combination of these functions determines the fault tolerance of the entire storage cluster.

  • User data protection method

  • Redundancy of the storage controller

  • Redundancy of the cluster master node

  • Spread placement group

In this manual, fault-tolerant system configurations are defined as follows: If the system can continue to operate when a maximum of one storage node or drive is faulty, the system has a plus-1 redundant configuration. If the system can continue to operate when a maximum of two storage nodes or drives are faulty, the system has a plus-2 redundant configuration.

The ability to continue system operation means that the system can tolerate errors that might otherwise cause shutdown, inability to exchange data with a host, or data loss. Also, by setting spread placement groups, the system can tolerate the same number of failures as the number that the redundant configuration allows when failures occur in the AWS hardware.

Note:

If the system becomes no longer operable because of, for example, occurrence of more failures than each redundant configuration allows, you might need to re-install VSP One SDS Block. Back up user data to other storage media in case the system becomes no longer operable. User data cannot be restored by re-installing VSP One SDS Block.

If the system becomes no longer operable, the maximum time required after failure occurred until acceptance of host I/O stops is two minutes and 20 seconds (in the case of plus-1 redundant configuration) and three minutes and 15 seconds (in the case of plus-2 redundant configuration). Consider these times when designing the application.

The following shows features of plus-1 and plus-2 redundant configurations. Select either of these configurations according to your priorities for using VSP One SDS Block.

Configuration

Feature

Plus-1 redundant configuration

  • More volumes can be created

  • Higher write performance

  • Shorter time required to complete rebuilding

Plus-2 redundant configuration

  • Higher fault tolerance

The table below shows combinations of settings that must be specified to create a plus-1 or plus-2 redundant configuration. Settings other than the combinations below are not allowed.

CAUTION:

If a failure occurs in the storage cluster while the write back mode with cache protection is disabled, data on the snapshot volume might be lost. When the write back mode with cache protection is enabled, data on the snapshot volume is protected.

However, even if the write back mode with cache protection is enabled, if failures exceed the storage controller redundancy, the data on the snapshot volume is not protected.

Configuration

Function settings

User data protection method

Redundancy of the storage controller1

Number of cluster master nodes

Number of fault domains

1. Plus-1 redundant configuration2

Mirroring

Duplication

OneRedundantStorageNode (degree = 2)

3 nodes

1

2. Plus-2 redundant configuration2

HPEC 4D+2P

TwoRedundantStorageNodes (degree = 3)

5 nodes

1

1. You do not need to explicitly specify the degree of redundancy of the storage controller, because it is automatically determined depending on the user data protection method.

2. In the following cases, only one failure is assumed to have occurred, irrespective of the number of failures:

  • One or more drive failures occurred on a faulty storage node.

  • Drive failures occurred on a single storage node.

Provides an overview, features, and notes for each.

User data protection methods

VSP One SDS Block supports HPEC (Hitachi Polyphase Erasure Coding) and Mirroring as a way to protect user data. HPEC is the Hitachi proprietary data protection method developed for SDS systems with low network bandwidth between storage nodes. HPEC stores user data on a local drive. Mirroring is a data protection method that stores a copy of user data on another storage node.

Configure by selecting 4D+2P for HPEC and Duplication for Mirroring. Note, however, that selecting either of these options might cause restrictions on the combination of functions that can be used.

  • HPEC 4D+2P (4 data +2 parity):

    Set if you want to focus on the number of failures allowed. The number of storage nodes or drive failures allowed is two.

  • Mirroring Duplication (1 data +1 copydata):

    Select this method if performance is priority. The Mirroring method is superior to the HPEC methods in fault tolerance against storage node or drive failures, as well as in performance during normal operation. The number of storage nodes or drive failures allowed is one.

for HPEC 4D+2P

  • User data and its parities are stored on six or more different storage nodes for redundancy.

  • At least six storage nodes are required.

  • Users can use a maximum of 50 to 65% of physical capacity.

    However, if the rebuild capacity policy (rebuildCapacityPolicy) is set to "Fixed" (default), users can use a maximum of 50 to 65% of the physical capacity excluding the rebuild capacity on each storage node. For details about the rebuild capacity, see Rebuild capacity of a storage pool in the VSP One SDS Block System Administrator Operation Guide.

  • The number of storage node or drive failures allowed is two. The number is the sum of the number of defective storage nodes and the number of defective drives. However, the number is counted as one failure in the following cases.

    • One or more drive failures occurred on a faulty storage node.
    • Drive failures occurred on a single storage node.

(A) Store data locally and reduce network communication during read

(B) Primary coding: Coding reduces data volume for two redundancies

(C) Secondary coding: Data storage capacity is reduced to achieve capacity efficiency that is equivalent to EC (Erasure Coding)

for Mirroring Duplication

  • User data and its copies are stored on two different storage nodes for redundancy.

  • At least three storage nodes are required.

  • Users can use a maximum of 40 to 48% of physical capacity.

    However, if the rebuild capacity policy (rebuildCapacityPolicy) is set to "Fixed" (default), users can use a maximum of 40 to 48% of the physical capacity excluding the rebuild capacity on each storage node. For details about the rebuild capacity, see Rebuild capacity of a storage pool in the VSP One SDS Block System Administrator Operation Guide.

  • The read performance of this method is equivalent to the HPEC 4D+2P method but the write performance is superior to the HPEC 4D+2P method. The fault tolerance against storage node or drive failures is also superior to the HPEC methods.

  • The allowable number of defective storage nodes or drives is 1. However, even if two or more storage nodes or drives become defective, the failures are tolerated unless any of the following conditions are met:

    The allowable number of defective storage nodes or drives is 1. The number is the sum of the number of defective storage nodes and the number of defective drives. However, the number is counted as one failure in the following cases.

    • One or more drive failures occurred on a faulty storage node.
    • Drive failures occurred on a single storage node.

    However, two or more failures can be allowed except in the following cases.

    • Condition 1:

      Storage node or drive failures occur on both storage nodes that redundant storage controllers belong to. For details about storage controllers, see Redundancy of the storage controller.

    • Condition 2:

      Failures occur on two or more cluster master nodes. For details about cluster master nodes, see Redundancy of the cluster master node.

For how to design the capacity for the HPEC 4D+2P or Mirroring Duplication method, see Capacity Design (for HPEC 4D+2P) or Capacity design (for Mirroring) in the VSP One SDS Block System Administrator Operation Guide.

Redundancy of the storage controller

Storage controllers are part of VSP One SDS Block processes that manage storage node capacities and volumes.

An equal number of storage controllers and storage nodes manage the capacity and volumes of each storage node. Each storage controller manages one storage node and can also manage one or more other storage nodes for redundancy in case a storage node failure occurs.

The two settings below that determine the degree of redundancy of the storage controller are available. You do not need to explicitly specify the degree of redundancy of the storage controller, because it is automatically determined depending on the user data protection method.

  • OneRedundantStorageNode (degree = 2)

    The system can continue to operate if a maximum of one storage node becomes faulty.

    The following shows an example of assigning storage controllers to storage nodes if OneRedundantStorageNode is selected:

  • TwoRedundantStorageNodes (degree = 3)

    The system can continue to operate if a maximum of two storage nodes become faulty.

    The following shows an example of assigning storage controllers to storage nodes if TwoRedundantStorageNodes is selected: