Use the following best practices for infrastructure reporting.
Report timing
Most performance issues are analyzed using one-minute data intervals. However, performance problems do occur requiring shorter interval analysis. Analysis of one-minute data is generally limited to one- or two-day durations. Short intervals avoid muting peaks by averaging performance metric values.
Troubleshoot high response times
When you monitor your infrastructure resources, the most significant metric to watch in the online transactions is the I/O rate. The application processes a large number of transactions when the I/O rates are higher. Maintain healthy response times in an OLTP environment, which mostly generates random access I/O. The read I/O response times should be higher than the write response times. Use the following guidelines when considering response time thresholds:
- This threshold depends on the application requirements and the SLA.
- Since the LUN response time has a direct impact on applications, this indicator should be monitored on key LUNs to determine deltas as loads increase.
- Look for the worst performing LUNs and correlate with the host disk.
Maximum recommended array group utilization
- 50% during normal operations
- Utilization reserves are required to accommodate failure
- As high as possible, because batch metric is typically given in elapsed time
- Expect maximums of 70-80% (depending on the burst profile of initiator)
- Average utilization over time remaining in the batch window should not exceed 50%
VSP processor maximum planned utilization with capacity reserves to accommodate failure
- VSP Virtual Storage Director (VSD) manages a specific list of LDEVs.
- VSPs are redundant MPB pairs with 1:1 failover.
- VSD is monitored by analyzing MPB utilization, as shown in the following table.
VSD cache considerations
- The cache for each CLPR is allocated to the respective VSDs.
- The cache allocation for each VSD is initially uniform among VSDs, but can be dynamically reallocated in response to changing load conditions.
- Inflow control due to high write pending levels is local to the VSD/CLPR pair having a high write pending level.
- Accelerated de-staging due to high write pending levels is a system-wide activity.
- Up to 30% write pending is considered normal.
- Frequent or sustained increases to 40% deserve attention.
- Frequent or sustained increases to 50% deserve prompt attention.
- At 70%, emergency de-stage/inflow control is invoked.