z/OS New Age Performance Monitoring : Concepts & Directions
We need to accept several concepts that differ from the way we traditionally view performance data. Some of these fit nicely into the previously discussed historical perspectives, while some are more appropriate for how we should view performance today.
A processor (physical or logical) running at 100 percent utilization is not necessarily a problem. Workload Manager (WLM) tends to keep the processor busy. As long as Importance 1 and 2 workloads in their related service class periods are meeting their goals, why is this a problem? This does presume heterogeneous workload and a good WLM service policy.
The WLM Performance Index (PI) is a principle indicator of system performance. We need to monitor this at a system and Sysplex viewpoint. Are defined goals being met? Are the PIs appropriate for the workloads running in the associated service class periods? Is any subsystem’s performance suffering? Validating goals is an iterative process.
A system utilizing real storage at 100 percent is not necessarily a problem. WLM tends to keep the real storage full. The real issue is paging and, more important, what type? Who is paging? Many mission-critical business applications don’t tolerate any kind of paging. With real storage so inexpensive, it’s easy to add real storage to solve that kind of performance challenge. A system utilizing virtual common storage near 100 percent may be a serious problem. Many operating system components have moved their common storage requirements above the line and above the bar. System Queue Area (SQA) may be overflowing into Common Storage Area (CSA), which may not be a real problem. But when CSA becomes full, there could be a real problem.
Coupling facility performance is one of the critical metrics to overall system performance. A poorly configured, poorly tuned coupling facility will severely impact performance of the components and applications using it. The availability of multiple coupling facilities with sufficient capacity allows mission-critical applications to continue in outage scenarios. It’s important to understand who is using the coupling facilities and how well they’re servicing those users.
Cross System Communications Facility (XCF) performance is an important metric to overall system performance. Poorly defined XCF transport classes will negatively impact performance of the system components and applications using it. The definition of multiple transport classes with proper MESSAGELENGTH and buffers will ensure that CPU usage in the XCFAS address space will be optimal. It’s important to understand who is using the XCF facilities and how well they’re servicing those users.
DASD I/O should not be monitored at a device level (may be applicable to robust tape environments). With large numbers of volumes in many of today’s data centers, I/O monitoring must occur on an exception basis. Monitoring at the higher storage controller level is often beneficial. The metrics to look for include:
• Contention for specific devices
• Who is waiting?
• Where’s the time being spent to do the I/O?
• Is WLM involved? (If not, maybe it should be.)
Parallel Access Volumes (PAVs), Hiper PAVs, and I/O priority can help with I/O bottlenecks.
What’s Next?
Before moving into new directions for monitoring, we first need to incorporate the new age concepts just discussed. We need to:
• Display only relevant information and exceptions to the appropriate teams responsible for monitoring and managing our systems.
• Establish a new “expected client performance service” rating. Integrating more autonomic processes into our automation solutions will be important to the success of enterprise performance monitoring.


0 Comments:
Post a Comment
<< Home