Understanding utilization

Ops Center Analyzer User Guide

Version
11.0.x
Audience
anonymous
Part Number
MK-99ANA002-06

The single, most important cause of performance problems is an end-to-end (E2E) data path to a resource where the throughput capacity has been exceeded. In this case, infrastructure that supports the large-scale movement of data is bound to experience issues with throughput. To fulfill the promise of no single point of failure, all resources in the data path must have sufficient reserve capacity to carry increased load under failover conditions.

Performance analysis frequently begins with a determination of whether any E2E data path resources are overloaded. By using E2E data analysis in Hitachi Ops Center Analyzer (Analyzer) you can determine the particular areas of concern.

To understand how utilization is measured, consider the categories capacity and workload. Capacity refers to the percentage of physical space occupied across storage resources. Using percentages allows you measure the space occupied on a disk or a parity group. Not so straightforward is the category of workload, which can be thought of as the percentage of time it takes for the system to do an operation (for example, read or write). Another way of expressing workload is to measure the percentage of how busy a resource, such as a processor or cache, is.

These resources include those on the host and storage side, as well as the fabric. Because the whole system and its parts are affected by workload, the time and space aspect of measuring utilization can be combined to focus on capacity throughput, which is used to measure port and path utilization. Throughput capacity combines both considerations by calculating the percentage of I/O operations.

When monitoring infrastructure resource workload with Analyzer, use the following metrics to measure utilization:

  • IOPS (I/O per second): the number of operations
  • MBps (MB per second) : the amount of data transferred
  • Response time (read/write operations): the sum of service and wait time.

Optimizing online versus batch workloads

Optimizing utilization of a storage resource for an online workload and a batch workload is mutually exclusive. Consequently, online and batch workloads generally do not share the same storage access resource at the same time.

The performance for online workloads is monitored on a minute by minute basis. Performance alerts us to such problems as, at the most pressing, lagging response times, and while less immediate but just as significant, inadequate reserves to support processing during failures.

For batch workloads, the average performance over time is monitored. Since high utilization is a design goal, the metrics for these values are not a cause for concern: maximizing utilization per resource maximizes throughput per resource. Consequently, high response time is not central to monitoring batch workloads. Instead, monitoring batch workloads becomes a question of whether there is adequate capacity to complete processing within the batch workload window even after a component failure occurs.