RAC Detail View (Oracle Performance Analyst)

From RapidSQL
Jump to: navigation, search

Go Up to Oracle Performance Analyst Statistics

The Performance Analyst for Oracle has been upgraded to support for the monitoring of Oracle Real Application Clustering (RAC) environments (Oracle 9i and higher).

Oracle RAC allows a single physical Oracle database to be accessed by simultaneous instances of Oracle running across several CPUs. All statistics you see are listed by instance ID.

The RAC Detail view is divided into the following tabs:

Overview

The Overview tab is of the RAC Detail view divided into the following sections:

Key Ratio Analysis Pane

Database performance analysts typically use one of two methods for examining the performance levels of a database: ratio-based or wait/bottleneck-based analysis. Ratio-based analysis involves examining a number of key database ratios that can be used to indicate how well a database is running. Performance ratios serve as very good roll-up mechanisms for busy DBAs to use for at-a-glance performance analysis. Many DBAs have large database farms to contend with and can’t spend time checking detailed wait-based analysis outputs for each and every database they oversee. Succinctly presented performance ratios can assist in such situations by giving DBAs a few solid indicators that can be quickly scanned to see if any database needs immediate attention.

While there are many opinions as to what rules to follow, there are some standards that should always be adhered to. To start with, many of the formulas that make up ratio-based analysis must be derived from delta measurements instead of cumulative statistics. Many of the global ratios that a DBA will examine come from the v$sysstat performance view. This view maintains a count of all the occurrences (in the VALUE column) of a particular database incident (in the NAME column) since the database was brought up. For databases that are kept up for long periods of time, these values can grow quite large and impact how a particular ratio that a DBA may be looking at is interpreted. However, if delta statistics are used (taking, for a specified sampling period, the before and after counts of each statistic that make up a ratio), then an accurate and current portrayal of the various ratios can be had.

A final thing to remember about using ratio-based analysis is that, while there are a number of rules of thumb that can be used as starting points to begin the evaluation of database performance, DBAs must determine each database’s individual “personality”with respect to the key performance ratios. Some hard and fast rules simply do not apply to every database. The danger in using blanket ratio standards is that they can lead the DBA to haphazardly take action, which can at times contribute nothing to the situation, and sometimes even degrade performance.

The following ratios are used on the Performance Analyst RAC Overview page to succinctly communicate the general overall performance levels of the monitored database:

  • Cross Instance Cache Flush ratio
  • Ping Ratio
  • Local Buffer Access Percent
  • Local Read Percent
  • Global Buffer Busy Wait Percent

Instance Analysis Pane

As a DBA managing an Oracle RAC, you will oftentimes want a quick performance snapshot of the efficiency and activity across all your Oracle RAC nodes. The Instance Analysis section provides a fast picture of which nodes of your Oracle RAC appear overworked and those that are underutilized.

The following statistics are provided:

  • ID: The Oracle RAC instance ID
  • Name: The instance name
  • Sessions: The total number of sessions logged onto a RAC node
  • Physical I/O: A summation of all physical reads and physical writes for the RAC node
  • Logical I/O: A summation of all logical I/O (buffer gets, consistent gets, etc.) for the RAC node
  • Hit Ratio: An indicator of how often user requests for data are satisfied.
  • Memory Usage: The session's total memory usage against all session memory usage on the database.
  • Parses: The total number of parses experienced by the RAC node
  • Total CPU: The total CPU usage for the RAC node
  • Parse CPU: The amount of process CPU for the RAC node
  • Recursive CPU: The amount of process CPU caused by recursive calls on the RAC node
  • Other CPU: Total CPU usage minus parse and recursive activity. This can be a negative number if the Oracle kernel is not properly reporting total CPU usage
  • Disk Sorts: The total number of disk sorts for the RAC node
  • Memory Sorts: The total number of memory sorts for the RAC node
  • Rows Sorts: The total number of rows sorted on the RAC node
  • Commits: The total number of commits executed for the RAC node
  • Rollbacks: The total number of rollbacks executed for the RAC node
  • Executions: The total number of calls made on the RAC node
  • Physical Reads: The total number of physical reads produced on the RAC node
  • DB Block Gets: The total number of block gets produced on the RAC node
  • Consistent Gets: The total number of consistent gets produced on the RAC node
  • Consistent Changes: The total number of consistent changes produced on the RAC node

Bottleneck Analysis Pane

When an Oracle database is up and running, every connected process is either busy doing work or waiting to perform work. A process that is waiting can mean nothing in the overall scheme of things or it can be an indicator that a database bottleneck exists. And this is where wait-based or bottleneck analysis comes into play. DBAs used this form of performance analysis to determine if perceived bottlenecks in a database are contributing to a performance problem.

Bottleneck analysis is a valid method of measuring performance because it helps a DBA track where a database has been spending its time. If latch contention or heavy table scan activity has been dragging a database’s performance down, a DBA can use bottleneck analysis to confirm the actual root cause. Once one or more wait events or other bottlenecks have been pinpointed as possible performance vampires, the DBA can drill down and oftentimes discover a fair amount of detail about what sessions and objects are causing the problem.

For wait event analysis to be properly carried out, it is imperative that the timed_statistics initialization parameter be set to TRUE. By default this parameter is set to FALSE, which disallows the collection of wait times for each wait event defined to the Oracle engine. For one to really understand the impact of wait events on database performance, the DBA not only needs to discover what the database is or has been waiting on, but the durations of the waits. Having both measures in hand allows a complete picture to be formed regarding the magnitude of wait-initiated performance degradations. Almost all Oracle experts now agree that allowing the collection of timing statistics adds little if anything to database overhead, so setting timed_statistics to TRUE should not be a worry. The Performance Analyst home page identifies the top system and session waits that are currently a cause of lessened performance.

When using bottleneck analysis, you cannot rely only on information contained in the wait event views that Oracle provides. For example, an object may attempt to extend into another extent of space in a tablespace and yet be denied if no such free space exists. Such a failure will not be reflected in any wait event, but still represents a very real bottleneck to the database. In the same way that you cannot depend on only a few ratios to properly carry out ratio-based performance analysis, an administrator must include other statistical metrics in their overall bottleneck analysis framework to obtain an accurate performance risk assessment. Performance Analyst helps identify bottlenecks in your database that fall outside of pure wait events so you can get a total picture of all stoppages in your system.

For an Oracle RAC, Performance Analyst highlights the top RAC-related waits as well as conflicts that may be slowing down response times. Here you see statistics for Top Rac-Related Waits and Conflicts. Within each section, you find:

Top Rac-Related Waits

When viewing wait statistics, there are several levels of detail that a DBA can drill down into. The first level is the system view, which provides a global, cumulative snapshot of all the waits that have occurred on a system. Viewing these numbers can help a DBA determine what wait events have caused the most commotion in a database thus far. The Top RAC-related bottlenecks section display identifies the top waits that have occurred on the node that Performance Analyst is currently connected to.

Appendix A in the Oracle Reference manual contains a listing and description of every current wait event defined in Oracle. DBAs unfamiliar with what each event represents should keep this listing close by as they examine wait-based event metrics. For example, a “db file scattered read” event is typically indicative of table scan operations. If many of these events are observed, then the DBA should begin to see if large table scans are occurring in the database. Like the “db file scattered read” event, each wait event has its own meaning and individual end-resolution diagnosis.

After looking at system-level wait activity, a DBA can drill down further to discover what current connections may be responsible for any reported waits that are being observed at the system level. Performance Analyst reports on historical and current wait events at the session level, making this investigation easy to accomplish.

Conflicts

There are a number of conflicts that can occur over an Oracle RAC that can reduce response times. The Conflicts section highlights some of the more common conflicts and provides a count of how many times each has occurred.

The conflicts listed include:

  • Library Cache: How many library cache invalidations have occurred on the node that Performance Analyst is connected to.
  • Data Dictionary Cache: How many data dictionary cache invalidations have occurred on the node that Performance Analyst is connected to.
  • Forced Writes (Oracle 9.2 and above): Number of times a block was written to cache because an instance had used the block, but another instance had requested the lock on the block in a conflicting mode.
  • Pings: How many cross instance writes occurred
  • Cache Timeouts: The number of global cache convert timeouts that have occurred.

Workload Analysis Pane

When your phone starts ringing with complaints of performance slowdowns, one of the first things you should get a handle on is:

  • Who is currently connected to the database
  • What resources are they using
  • What are they currently executing

The Workload Activity Analysis section of the Performance Analyst home page provides insight into the leading resource hogs across all instances of an Oracle RAC. Drill down’s are available so you can easily get detailed information into what each leading session is currently involved with.

If you are seeing a slowdown in your database, and can't seem to find a root cause, one thing to examine is the resource consumption of the leading sessions on a system. Often you will find one or a handful of users making life miserable for everyone else. They accomplish this by overwhelming the I/O capabilities of Oracle (through untuned queries or runaway batch jobs) or hammering the CPU or memory structures of the database and server.

The Workload Analysis section presents:

  • Top Physical I/O Process
  • Top Logical I/O Process
  • Top Memory Process
  • Top CPU Process

For each of these, you see the SID, Instance ID, Username, %Used.

If any one session appears to be using more than 50% of a total resource (CPU, memory, etc.), then you should drill down into the session to find out what it is currently executing.

Memory

The Memory tab of the RAC Detail view is divided into the following sections:

Key Ratio Analysis Pane

Database performance analysts typically use one of two methods for examining the performance levels of a database: Ratio-based or wait/bottleneck-based analysis. Ratio-based analysis involves examining a number of key database ratios that can be used to indicate how well a database is running. Performance ratios serve as very good roll-up mechanisms for busy DBAs to use for at-a-glance performance analysis. Many DBAs have large database farms to contend with and can’t spend time checking detailed wait-based analysis outputs for each and every database they oversee. Succinctly presented performance ratios can assist in such situations by giving DBAs a few solid indicators that can be quickly scanned to see if any database needs immediate attention.

While there aremany opinions as to what rules to follow, there are some standards that should always be adhered to. To start with, many of the formulas that make up ratio-based analysis must be derived from delta measurements instead of cumulative statistics. Many of the global ratios that a DBA examines come from the v$sysstat performance view. This view maintains a count of all the occurrences (in the VALUE column) of a particular database incident (in the NAME column) since the database was brought up. For databases that are kept up for long periods of time, these values can grow quite large and will impact how a particular ratio that a DBA may be looking at is interpreted. However, if delta statistics are used (taking, for a specified sampling period, the before and after counts of each statistic that make up a ratio), then an accurate and current portrayal of the various ratios can be had.

A final thing to remember about using ratio-based analysis is that, while there are a number of rules of thumb that can be used as starting points to begin the evaluation of database performance, DBAs must determine each database’s individual “personality” with respect to the key performance ratios. Some hard and fast rules simply do not apply to every database. The danger in using blanket ratio standards is that they can lead the DBA to haphazardly take action, which can at times contribute nothing to the situation, and sometimes even degrade performance.

The following ratios are used in the Memory Key Ratio Analysis section to succinctly communicate the general overall performance levels of the monitored database:

  • Buffer Cache Hit Ratio: How often requested information is found in memory versus on disk.
  • Library Cache Hit Ratio: Shows how often SQL code is being reused by other database users versus the number of times a SQL statement is broken down, parsed, and then loaded (or reloaded) into the shared pool.
  • Latch Miss Ratio: Defines the number of times a process obtained a willing-to-wait latch versus missing the attempt.

Memory Analysis Pane

The Memory Analysis section displays the memory configurations for each node currently up in the RAC. The following information is shown:

Name: The name of the instance.

  • DB Buffers (MB): The default memory cache that maintains data blocks when they are read from the database. If the DBA doesn't specifically place objects in another data cache (which will be covered next), then any data requested by clients from the database will be placed into this cache. This memory area is controlled by the b_cache_size parameter.
  • Log Buffer Size (MB): This area buffers modifications that are made to the database before they are physically written to the redo log files. The log_buffer configuration parameter controls this memory area.
  • Shared Pool Size (MB): This familiar area holds object structure as well as code definitions, and other metadata. Setting the proper amount of memory in the shared pool assists a great deal in improving overall performance with respect to code execution.
  • DB Caches (MB): Beginning in Oracle9i, a DBA can create tablespaces whose blocksize differs from the overall database blocksize. When data is read into the SGA from these tablespaces, their data has to be placed into memory regions that can accommodate their special block size. Oracle9i and above has memory settings for 2K, 4K, 8K, 16K, and 32K caches. The configuration parameter names are in the pattern of db_nk_cache_size. For each RAC node, you see the following statistics:
    • Buffer Cache Hit Ratio
    • Library Cache Hit Ratio
    • Latch Miss Ratio

Library Cache Analysis Pane

The Library Cache Analysis section provides a drill down view into invalidations that have occurred across the cluster. The following information is shown:

  • ID: The instance ID.
  • Name: The instance name.
  • Namespace: The area of the library cache.
  • DLM Lock Requests: The number of get requests for instance locks.
  • DLM Pin Requests: The number of pin requests for instance locks.
  • DLM Pin Releases: The number of release requests for pin instance locks.
  • DLM Invalidation Requests: The number of get requests for invalidation instance locks.
  • DLM Invalidations: The number of invalidation pings received from other RAC instances.

Data Dictionary Cache Analysis Pane

The Data Dictionary Cache Analysis section provides a drill-down view into invalidations and conflicts that have occurred across the cluster. The following information is shown:

  • ID: The instance ID.
  • Name: The instance name.
  • Parameter: The specific object area of the dictionary cache.
  • DLM Requests: The number of lock manager requests.
  • DLM Conflicts: The number of lock manager conflicts encountered.
  • DLM Releases: The number of lock manager releases.

Ping Activity

Pinging is the process where one Oracle Instance requests another to write a set of blocks from its SGA to disk so it can obtain it in exclusive mode. To move a data block from one instance's SGA to another is a slow process. The Ping Activity tab of the RAC Detail view displays several levels of specificity, starting from general to specific. The Ping Activity tab is divided into the following sections:

Ping by Instance - shows how often pinging has occurred across all the monitored instances.

Ping by Datafile - shows the number of blocks pinged from the shared set of RAC datafiles.

Ping by Object - provides a drill down view into the amount of specific ping activity across the shared objects and datafiles of a RAC:

  • Owner
  • Object Name
  • Partition Name
  • Block Number
  • Segment Type
  • Kind
  • Forced Reads
  • Forced Writes
  • Datafile

Locks

The Locks tab of the RAC Detail view is divided into the following sections:

Lock Convert Summary - The Lock Convert Summary view provides efficiency statistics regarding the speed at which the RAC acquires and converts locks across the instances. You see the average get and convert times for each RAC node.

DLM Lock Activity - The DLM (Distributed Lock Manager) Lock Activity view provides detailed information on the number of lock conversions that have occurred across the RAC.

PCM Lock Summary - The PCM (Parallel Cache Management) Lock Summary provides a summary count of PCM locks that are releasing, acquiring, or are invalid.

Latch Detail - Protecting the memory structures in Oracle's SGA are latches. Latches ensure that one and only one process at a time can run or modify any memory structure at the same instant. Much more restrictive than locks (which at least allow for some collective user interaction), latches have no queuing mechanism so either you get it or you don't and are forced to continually retry.

The latch detail tab of the contention detail view presents a detailed view of latch activity. Information presented includes:

  • Instance Name: The name of the instance
  • Name: The name of the latch
  • Gets: The total number of times the latch was requested by a process
  • Misses: The total number of failed attempts to acquire the latch on the first attempt
  • Immediate Gets: The total number of no-wait requests for a latch
  • Immediate Misses: The total number of failed no-wait attempts to acquire the latch on the first attempt
  • Immediate Sleeps: The total number of requests that "paused" while waiting for a latch

Common indicators of latch contention are a latch miss ratio (which records willing-to-wait mode latch requests) and a latch immediate miss ratio (which records no-wait mode latch requests). These statistics reflect the overall health of how often latch requests were made and satisfied without waiting.

Here are a few latch contention situations that you should recognize and get to the bottom of quickly:

  • Cache buffer chain latch: This latch is responsible for protecting paths to database block buffers in the buffer cache. Very high I/O loads tend to cause contention for this latch. You can alleviate contention somewhat by adding more buffers to the cache (through the db_block_buffers parameter) or by adding more LRU latch chain latches with the db_block_lru_latches parameter.
  • Library cache latches: Protects cached SQL statements in the library cache area of the Oracle shared pool. Contention for this latch is the usual result of literals being used for SQL statements instead of bind variables.

Sessions

The Sessions tab of the RAC Detail view is divided into the following sections:

User Session by Instance - The User Session by Instance section provides a graphical count of the number of connected sessions for each node in the RAC.

Blocks by Instance - The Blocks by Instance section provides a graphical count of the number of blocked sessions for each node in the RAC.

User Waits by Instance - The User Waits by Instance section provides a graphical count of the number of historical session waits (non-idle) for each node in the RAC.

Top Sessions - When a system experiences heavy activity, sometimes you’ll find that all the user connections are contributing somewhat equally to the overall load. Often, however, one or two user connections are causing most of the activity. It may be that a large batch load or other typical process is running that is perfectly OK for your system. Or it may be a runaway process or other rogue connection that needs to be tracked down and possibly eliminated.

The top sessions control displays information regarding all key metrics for all current sessions on a database. Information presented includes:

  • SID: The system ID of the process
  • Instance: The instance the session is connected to
  • Username: The username of the process
  • O/S ID: The operating system ID of the process
  • Machine Name: The workstation where the process originated
  • Logon Time: The date/time the process logged on to Oracle
  • Tot Physical I/O: A summation of all physical reads and physical writes for the process (includes direct)
  • Tot Physical I/O: A summation of all logical I/O (buffer gets, consistent gets, etc.) for the process
  • Parses: The total number of parses the process has caused
  • Total CPU: The total CPU usage for the process
  • Parse CPU: The amount of process CPU caused by parse activity
  • Recursive CPU: The amount of process CPU caused by recursive calls
  • Other CPU: Total CPU usage minus parse and recursive activity. This can be a negative number if the Oracle kernel is not properly reporting total CPU usage
  • Disk sorts: The total number of disk sorts caused by the process
  • Memory sorts: The total number of memory sorts caused by the process
  • Rows sorted: The total number of rows sorted by the process
  • Commits: The total number of commits executed by the process
  • Rollbacks: The total number of rollbacks executed by the process
  • Physical reads: The total number of physical reads produced by the process
  • DB block gets: The total number of block gets produced by the process
  • Consistent gets: The total number of consistent gets produced by the process
  • Consistent changes: The total number of consistent changes produced by the process