Hitachi Ops Center Analyzer provides an intuitive UI for performance monitoring, management, and troubleshooting. The Analyzer detail view collects data from monitored targets (such as storage systems, hosts, and switches) using software probes that support each device or environment. Analyzer detail view also provides historical trend analysis and extensive report generation capabilities.
This analytics solution provides end-to-end monitoring and troubleshooting capabilities for your infrastructure resources, from host to storage system. The basic workflow for Performance Analytics troubleshooting is called the MAPE loop:
- Monitor
- Analyze
- Plan
- Execute
When reviewing and evaluating reports and event information on the Ops Center Analyzer Dashboard, you can also perform a deep dive analysis by launching the Analyzer detail view UI. The deep dive is part of the Analyze segment of the MAPE loop workflow.
The following workflow is an example of how to use this troubleshooting methodology as an infrastructure administrator who manages user resources (such as consumers, VMs, and volumes) and system resources (such as cache, ports, CPUs, and disks).
Viewing the dashboard
As an infrastructure administrator, you set up dynamic thresholds on the user resources you are monitoring. After seeing nine critical alerts on VM/Host resource gauge, you become interested in troubleshooting a threshold violation.
You browse the resources with critical alerts and select the target VM to analyze in the E2E View.
Using E2E or Sparkline views
The E2E view represents the topology of infrastructure resources: from host, to fabric switch, to storage system. The infrastructure administrator sets the base point of analysis on the target resource for analysis. This view enables you to see the relationship between resources.
To move deeper into the underlying resources, you can launch the Sparkline view, which presents multiple charts that track performance by component. Use this view to correlate performance trends between user and system resources.
Using additional troubleshooting tools
Ops Center Analyzer offers multiple troubleshooting tools for isolating a bottleneck candidate and identifying the root cause. You can launch any of the following tools for further analysis:
- Verify Bottleneck: Use at the initial stage of analysis to compare performance charts of the base point of analysis with the bottlenecked candidate.
- Identify Affected Resources: Use to display the user resources that rely on the bottlenecked resource.
- Analyze Shared Resources: Use if you suspect that the root cause of the problem is resource contention, a noisy neighbor that disrupts the balance of resource usage. You compare performance charts of the bottleneck candidate to the resources using the bottleneck. After comparing performance across a number of resources with Analyze Shared Resources, you isolate the actual bottleneck.
- Analyze Related Changes: Use if Analyze Shared Resources does not reveal the actual bottleneck (noisy neighbor), or if you suspect that the root cause of the problem is a recent configuration change. In this view, you compare performance charts with configuration events. The bar graph portion of the chart represents the configuration changes made at a particular time. You can click on a bar to list those changes.
Performing a deep dive analysis
Regardless of which tool you use, after you have isolated the bottleneck candidate and validated the root cause, you can collect more information to understand its origin. For example, you have identified a storage system as the bottleneck. Subsequently, you want to understand how the problem affects other resources or vice versa. This phase of the troubleshooting analysis is called the deep dive. In a deep dive analysis, you can compare the data of various components from the resource tree, which displays all the resources and their components in your infrastructure, and run a customized report against that data.
To proceed with the deep dive for information, launch the Analyzer detail view UI, which provides detailed reports at the component level. You can launch this component-level view from the following windows in the Ops Center Analyzer UI during analysis:
- E2E view
- Sparkline view
- Performance tab of the Show detail window for a resource
- Analyze Shared Resources
- Analyze Related Changes
When analyzing system resources in Analyzer detail view, you can view performance charts based on various metrics to correlate components with resource performance. For example, you have validated the root cause of the storage system bottleneck, but you want to perform further analysis in Analyzer detail view.
The following figure examines the performance of the volume from the VM side. This report, LDEV IOPS versus Response Time, displays spikes at specific times, which you can then use as reference points for when the I/O activity was particularly intensive during otherwise typical workloads.
Digging deeper, you discover the storage systems and volumes associated with a particular VM. You cross-reference the resources in the VM performance chart and determine the component with the performance that correlates to the VM. In this example the resource that correlates with VM performance is the cache on the storage side (CLPR). This workload is typically intensive, but you realize that the times when the resource reached 100% correlate with the spikes in the LDEV IOPS versus Response Time report.
Often, the performance problem is a recurring trend; for example, when monitoring certain infrastructure resources, you notice spikes in I/O activity every weekday at 3 PM. When you create a customized report, you discover this trend has persisted for six months. (In theory, you can review performance from months to years.) This capability to review past performance adds a historical element to deep dive analysis.
Initiate recovery plan to solve the performance problem
After establishing the correlation between the two charts, you return to the Ops Center Analyzer UI to initiate a recovery plan. You can enter the key metric, date, and time of the problem occurrence, and the target value for the metric. In this case, the problem component is the CLPR; the key metric is IOPS. You can specify conditions, then review the recovery plan generated by Ops Center Analyzer before running it.
After the recovery plan runs successfully, you can adjust your thresholds with new metric settings to monitor the user resources (in this case, the VM and the affected volume). At this stage, you have completed the MAPE loop.
E2E infrastructure topology view
The E2E topology view provides the detailed configuration of the infrastructure resources and lets you view the relationship between the infrastructure components. You can manually analyze the dependencies between the components in your environment and identify the resource causing performance problems. By using the topology maps, you can easily monitor and manage your resources. Use this view to monitor resources in your data center, including applications, virtual machines, servers, networks, and storage systems.
In the E2E view, each node represents a resource, and the connecting links represent the relationship between the infrastructure components. You can analyze a target resource and all associated resources. You can also view alerts associated with all related resources and trace the problem at the root level. The node-based E2E view helps you analyze the problem on the affected node and its impact on other resources. You can also open the Analyzer detail view UI to view a detailed performance report for a selected resource.
Topology view components
The E2E view displays the topology related to the selected resources under the following default infrastructure groups:
- Consumer: The name of the consumer group to which the selected resource belongs and the details about the consumer grade level.
- Server: The associated server components, such as VMs and hosts.
- Network: The associated network components, such as switches.
- Storage: The associated storage components, such as volumes.
A number link is shown next to each resource icon. For example, when you select a storage subsystem as a target resource for analysis, and if 50 volumes belong to this storage subsystem, the value Volumes 50 is shown under the Storage infrastructure group. Click the Volumes link to open the Volumes - Storage window, which displays details about the volumes in the storage subsystem. From the Volumes list, select the priority of volumes that you want to analyze in the E2E view.
E2E view tool bar
The tool bar provides quick access to frequently used menu options and icons:Options | Description |
---|---|
Sparkline View |
Navigate to the Sparkline view to analyze the performance of the base point resource and the related resources to identify the bottleneck. |
Critical |
Number of critical alerts in the topology view. |
Warning |
Number of warnings in the topology view. |
Configuration Information |
Number of indicators for configuration information. |
Configuration Status |
Information about drives, such as availability or the battery life of an SSD drive. |
Copy Pair Information |
Copy pair information for volumes. |
VSM Information |
Virtual storage machine information for copy pair volumes. |
High Share Rate |
Resource sharing percentage of a shared resource. Hover over the resource icons to display the share percentage for each resource. Resources with high share rate are potential bottleneck candidates. The Share Rate value is not displayed when you set a Hypervisor or Storage System as the base point of analysis. Select OFF to turn off this feature. |
Configuration Status |
Information about drives, such as availability or the battery life of an SSD drive. |
Storage and Server Views |
The following topology views are supported:
|
Lock Highlight |
Select a resource node and click Lock Highlight to highlight all related components in the topology view. The resource configuration remains highlighted until you release the lock on the resource node. This feature helps you understand the links between components and analyze the system configuration in detail. To release the lock, click Lock Highlight again. |
Repaint |
Select a resource node and click Repaint to move the resource from bottom-to-top or right-to-left to the prime position. Use this option to change the display order of resources. |
E2E view menu bar
Menu bar items | Menu items and description |
---|---|
Show Detail button |
Select a resource and click Show Detail. The performance summary report of the resource opens in a new window. You can also view the events related to the resource in the Events tab. |
Show Report in Analyzer detail view |
Click a resource icon and select Show Report in Analyzer detail view. The Analyzer detail view UI opens in a separate browser window. The resource tree opens to the selected resource, along with the latest available report in the Performance view. |
Analyze Bottleneck menu |
Select a resource and click
Analyze Bottleneck. The Analyze Bottleneck Summary window opens. From the summary window, you can display the following tabs for the detailed analysis:
|
Action menu |
|
Set Flag menu |
Select a resource and click Set Flag to flag a resource so you can analyze the flagged resource at a later point. To remove the flag, click Unset Flag. |
Show Prediction | Select a resource and click Show Prediction to generate a report showing the predicted performance trend for that resource. The report is based on the risk profiles that you select. After the report is generated, go to the Predictive Analytics tab to view the results. |