Enhanced fault tolerance with failure domains

Content Software for File User Guide

Version
4.2.x
Audience
anonymous
Part Number
MK-HCSF000-03

In the Content Software for File system, failure domains are groups of backends that could fail due to a single underlying issue. For instance, if all servers within a rack rely on a single power circuit or connect through a single ToR switch, that entire rack can be considered a failure domain. Imagine a scenario with ten racks, each containing five Content Software for File backends, resulting in a cluster of 50 backends.

To enhance fault tolerance, you can configure a protection scheme, such as 6+2 protection, during the cluster setup. This makes the Content Software for File system aware of these possible failure domains and creates a protection stripe across the racks. This means the 6+2 stripe is distributed across different racks, ensuring that the system remains operational even in case of a complete rack failure, preventing data loss.

It's important to note that the stripe width must be less than or equal to the count of failure domains. For instance, if there are ten racks, and one rack represents a single point of failure, having a 16+4 cluster protection is not feasible. Therefore, the level of protection and support for failure domains depends on the stripe width and the chosen protection scheme.