ASH - Active Session History
ASH stands for "active session history" and it's official introduction was in Oracle 10g. Oracle 10g introduced ASH and Oracle 11 builds upon that foundation. ASH is a radically new way to gather performance data. “Radical?” You might ask. Yes, “radical” because ASH only takes samples of data once a second and what happens between samples is lost. The lost data bothers many at first but this unease quickly passes when the power of ASH is understood. ASH provides information necessary to solve some of the toughest performance problems. Before ASH this information was often too difficult and too expensive to get. Previous performance gathering techniques such as STATSPACK (and continued with the even more expensive AWR whose report is almost the same as STATSPACK) lacked the information to solve many performance bottlenecks. Session tracing provides much of the same information as ASH but is much more costly and has to be set up before a problem arises, proving to be impractical in many situations. Starting in Oracle 10g, ASH is always running, sampling every second, and saved for a week on disk (configurable), thus providing the data need to identify and solve a problem that may have only lasted seconds and days ago.
ASH is way of sampling the state of sessions connected to an Oracle database in order to monitory database load as well as drill down into any performance issues that may arise. ASH is a technology that can be applied to any system with connected users such as another database like SQL Server or an operating system or even applications and application servers. Oracle is the first system that I know of using ASH to collect performance data. ASH by default on an Oracle database, samples once every second and logs some 30 to 60 (depending on version) pieces of information on any session that is active at the time of sampling. One hurtle to accepting ASH as a model had been the question “how fast do we have to sample to get worth while data”. There has been a tendency (I admit having felt this) to want to sample super fast like 10-100 times a second (easily possible with a C program) but experience has shown that most monitoring and problem resolution can be easily accomplished with 1 second sampling. Higher rate of sampling serve only for rare cases that are better left to tracing. For 99.9% of the time, sampling once a second is sufficient to both show clearly the load on the system as well as provide detailed information on system activity that would otherwise be too difficult or prohibitively expensive to collect (due to the load caused by the data collection itself). ASH is the collection of active session stated sampled every second. An active session is any session that has made a call to the database.
The commitment that Oracle took to ASH required a break from the compulsive quest to gather all the statistics possible at a 100% accuracy which was a futile and limiting strategy. By letting go the drive to collect everything all the time accuratly, Oracle was able to collect more information, more powerfully and with less over head. How can less be more? ASH took an understanding that the most powerful performance data can be collected not by collecting every thing but by collecting the most important information in a particular manor.
Instead of collecting the exact statistics, ASH is a statistical approximation to the statistic counters. ASH samples every second the session states of all active sessions as well as the SQL the session is executing. This sampling produces a statistical approximation that is cheaper to collect and the multidimensional data allowing new and previously impossible diagnostics.
Intractable Performance Data Collection Problem Previous to ASH, Oracle performance collections tried to collect as much statistics as often as possible. This produced a couple of problems
- 1. Heisenberg affect
- 2. Overwhelming amount of data
For example, database tools tried to collect statistics on all the Sessions, SQL and Objects in the database.
- Session information came from v$session.
- SQL information came fro v$sql
- Object information came from v$segstat (as of Oracle 9i, before 9i it wasn't available)
The collection of this information amounted to massive amounts of work. For example on a system with 150 sessions the number of values to collect would be 150 sessions x (800 wait events + 200 statitics) = 150,000 values to collect every collection ! That's just session information. For SQL, there could be tens of thousands and for each statement we might want to collect a couple dozen statistics. Same for objects - there could be 1000s of objects and for each we'd want to collect a dozen of statistics. The problem quickly becomes intractable.
There are optimizations of the collection that allowed some people to doggedly try and collect these kind of statistics. For example for Sessions, SQL and Objects we could filter out any statistics that were zero but for any statistics that wasn't zero we'd have collect it because we wouldn't know if it had changed since we last collected it. For sessions, Oracle does have a counter that tells whether the session has been active since we last collected but SQL and Objects don't have any such counter
- sessions x (# wait events + statistics)
- Example (150 x (800+200) = 150,000 )
- Could be 10000s
- Takes out latches that compete with other sql executions
- V$segstat 9i+
- Could be 1000s of objects
- How about other things like v$filestat, etc etc?
The solution? Every second query v$session which is an inexpesive view onto the actual C structures in Oracle shared memory. Only collect information for sessions that are active at the time of collection. Collecting only active sessions is a natural filter for everything we want to filter. This filters out not only unwanted sessions, but also serves as a guide to the SQL,Object and Files we also want to collect information on. We gather not only session information and state, but also gather what SQL they were executing, files and objects they were accessing, honing in directly to the information of most interest to us.
- Intelligently Collects Data
- Samples once a second
- Collects active sessions only
- Collects the SQL being executed by the session and related information such as objects accessed, files read from etc
- More activity, more data collected
- Less activity, less data collected
- collected everything
- Obfuscated the problem, too many statistics too late
- Too Granular – once an hour ?! Give me a break
Sample Every Second
- Knowing everything – impossible and overwhelming
- Knowing enough – possible and manageable
Sampling is like taking a motion picture. We miss what happens between pictures but we get enough to know exactly what happened.
In the above diagram, the vertical lines represent the instant samples are take. Samples are taken once a second, thus for long running queries, over a second, we definitely capture the information. For short queries that are run often, we might miss a lot but we will capture a lot. For short queries that are rarely run, we will miss most of them. This kind of capture filters out queries that have little consequence on database load and focuses in on those queries that are important and put load on the database either because they are resource intensive or because they are run so often that they have a cumulative impact on resources.
- If it happens a lot or for long ... it will be captured by ASH.
Wait events can be classified into 4 major groups
On Oracle databases the classification is easy thanks to a table called V$EVENT_NAME which has a filed "WAIT_CLASS" that tells the kind of wait.