The Services Health dashboard is designed for use by Hitachi Vantara Support to troubleshoot issues. The dashboard shows health and other metrics for databases and services such as Metadata Gateway, Data Lifecycle, and Message Queue (RabbitMQ).
This dashboard contains the following rows:
- S3-Gateway and Metadata-Gateway Balance
- Shows metrics for S3 I/O distribution and balance and database partition balance. S3 I/O and database partitions should be distributed evenly among nodes. Investigate any value showing greater than a 15 percent imbalance.
- Database Health
- Shows metrics for database health, including database partitions per node, partition size, and the status of database protection. See Database Health metrics for descriptions of the most useful panels in this row.
- DB Information: Keyspaces and Partitions
- Shows metrics for database keyspaces and metadata partitions, including partition size by keyspace and partition, partition capacity consumed by metadata gateway, and partition splits by keyspace.
- Data Lifecycle Policies: Summary
- Shows summary metrics for all lifecycle policies, including policy counters started and completed, policies examined and completed, and the rate of policies examined and accepted.
- The following rows show metrics for the individual lifecycles that are examined and accepted by node:
- Data Lifecycle: DELETE_BACKEND_OBJECTS policy
- Data Lifecycle: TOMBSTONE_DELETION policy
- Data Lifecycle: VERSION_EXPIRATION policy
- Data Lifecycle: CHARGEBACK_POPULATION policy
- Data Lifecycle: INCOMPLETE_MPU_EXPIRATION policy
- Data Lifecycle: MIRROR_TABLE_MAINTENANCE policy
- Microservice Health
- Shows metrics for microservice anomolies, which include restarting a service, scaling a service up or down, or the service is down.
- Message-Queue Health
- Shows metrics for RabbitMQ memory usage by percentage and by node. RabbitMQ is a message broker that is used by HCP for cloud scale as a messaging intermediary.
Database Health metrics
The following metrics in the Database Health row provide useful information for monitoring the health of your databases:
- Largest Partition Size and Partition Size Health panels
- The Largest Partition Size shows the size of largest metadata partition in GB. The Partition Size Health panel provides a status message for the size. These panels are complimentary.
- A metadata partition size in the HCP for cloud scale environment should not exceed 1.5 GB.
- If these panels are yellow (the partition size is high, exceeds 1.5 GB) the size requires attention. This might be a temporary state, so you can allow for a up to a day to see if the state returns to healthy.
-
If these panels remain yellow or if they are orange (partition size exceeds 3 GB) or red (partition size exceeds 5 GB), complete the following steps to confirm that the Metadata-Coordination service is healthy:
- In the System Management application, click .
- If the Health column contains a state other than Healthy, click Repair.Note: If the Health column contains the Healthy state, but the Largest Partition Size and Partition Size Health panels do not return to green within 24 hours, you can also try to repair by clicking Repair.
Contact Hitachi Vantara Support if the following situations occur:- The state on Metadata-Coordination service page does not return to Healthy after clicking Repair.
- The Largest Partition Size and Partition Size Health panels do not return to green within a day of the Metadata-Coordination service repair.
- Database Partitions Per Node panel
- Shows the status of the number of metadata partitions per node. If there are no nodes with greater than 1000 partitions, this panel is green with a status of Normal.
- If any node has more than 1000 partitions, the panel color changes and a warning is provided as described in the following table. The risk of data unavailability increases as the partitions per node increase. Contact Hitachi Vantara to add additional nodes to distribute the partitions.
-
Panel Color Number of Partitions Action Yellow 1000-1500 Partition count is high. More nodes are required soon. Light red 1501-2500 IMPORTANT: Partition count is very high. Add more nodes. Red Over 2500 WARNING: The partition count is extremely high. Add more nodes immediately. - Partition protection panels
- The following partition protection panels show the number of protected, degraded, unprotected, orphaned, and overprotected partitions:
- 3x Fully Protected Partitions
- Shows the number of fully protected partitions. To be fully protected, a partition must have copies on three nodes in a cluster.
- Degraded Partitions
- Shows the number of partitions with a 2x protection level because the partitions are copied on only two nodes in a cluster. Degraded partition protection does not affect the function of HCP for cloud scale, but might result in performance degradation and requires attention to provide full protection.
- Unprotected Partitions
- Shows the number of partitions with no protection because the partitions have no copies. The partition is on only one node. Unprotected partitions require immediate attention because S3 applications are likely to experience errors for PUT requests.
- Orphaned Partitions
- Shows the number of partitions that are orphaned. An orphaned partition is one that is empty and no longer used.
- Overprotected Partitions
- Shows the number of partitions that have 4x protection. That is, the partitions are copied on 4 nodes in a cluster. This situation might occur because a node containing a partition was recovered after a third partition was copied on another node on the cluster.