IBConsole - Performance Guidelines
Go Up to IBConsole
Contents
Overview of Performance Guidelines
Logical Optimization
Performance of the database begins when the logical structure is designed. Normalization of tables, database page size, and indices are critical components of tuning.
Physical Optimization
Database tables by default should be normalized to 3rd Normal Form. There are instances, especially when dealing with mass amounts of data, which de-normalizing the table will provide greater performance because all the data are contained in a single table. Query design will be covered later, but it should be obvious that it is much faster to access a single table versus multiple tables.
Analysis Tools
Use an InterBase SQL tool, ISQL command line utility or WISQL GUI utility, to define and/or extract the current physical definition of the database.
Use the InterBase Server Manager to examine how the physical definition of the database affects storage of data within the database file. From the Server Manager menu bar select Select Tasks > Database Statistics to open a pop-up window and select View > Database Analysis to see the number of pages allocated and their fill percentages.
How the Optimizer Works
The purpose of the optimizer is to select the most inexpensive method of retrieving information requested in a query based on the current state of the database. InterBase performs the optimization for each query the first time that query executes for the database connection. If the application detaches and reattaches to the database then all the queries for that database are optimized when they are executed again. The benefits of this method are that the queries will be optimized based on the current state of the database, i.e., how many and what types of indices are defined, how useful those indices are, and approximately how many rows in each table. The optimization strategy or plan, is a series, from left to right, of retrievals from each individual table with the most expensive being the outer most retrieval and the most inexpensive being the inner most retrieval. To help clarify how this is done, we will use the following query on the employee.gdb database as an example:
SELECT * FROM employee e, department d, job j WHERE e.dept_no = d.dept_no AND e.job_code = j.job_code AND e.job_country = j.job_country AND e.job_grade = j.job_grade AND e.dept_no < 120 Retrieve from employee Retrieve from jobs Retrieve from department
The optimizer performs these seven major steps:
- 1. Decompose the Boolean expression into a set of conjuncts or associations. Boolean expressions being equalities, inequalities, etc. From the example, we would end with the following conjuncts:
e.dept_no = d.dept_no e.dept_no < 120 e.job_code = j.job_code e.job_country = j.job_country
- 2. Distribute the equalities and inequalities.
- If a=b and a=c then b=c is added to the set of conjuncts. The same applies for a constant, e.g., if a=5 and a=c then c=5 is added to the set of conjuncts. From the example, we would only create one new conjunct:
d.dept_no < 120
- 3. Enumerate those conjuncts that can use an index to create a stream.
- A stream is a set of records, such as a relation or a relation with restrictions on it. From the example, we would create the list of streams from the indices defined for employee, department, and job:
- Indices:
Employee: RDB$FOREIGN8: duplicate on field dept_no RDB$FOREIGN9: duplicate on field job_code, job_grade, job_country Job: RDB$PRIMARY2: unique on field emp_no MINSALX: duplicate on field job_country, min_salary MAXSALX: duplicate on field job_country, max_salary RDB$FOREIGN3: duplicate on field job_country Department: RDB$PRIMARY5: unique on field dept_no
- Streams:
Employee with index RDB$FOREIGN8 Employee with index RDB$FOREIGN9 Job with index RDB$PRIMARY2 Job with index MINSALX Job with index MAXSALX Job with index RDB$FOREIGN3 Department with index RDB$PRIMARY5
- 4. Enumerate all possible permutations of indexed joins from the streams above into "rivers."
- A "river" is a combination of multiple streams joined over the indexed fields. From the example, we would create these permutations:
- Employee to Jobs over job_code, job_grade, job_country
- Employee to Department over dept_no
- Employee to Job over job_code, job_grade, job_country to Department over dept_no
- Employee to Department over dept_no to Job over job_code, job_grade, job_country
- 5. Select the longest river.
- Select the longest river (assumed to be the cheapest); if two rivers are of equal length, then do a cost estimate to select the cheaper river. The cost estimate depends on several variables: whether there is an index, the selectivity (estimate of usefulness) of that index, whether there are selection criteria, the cardinality (approximate number of values) of the underlying relation, and whether the stream needs to be sorted. For example, when retrieving an indexed equality, the cost is estimated as: cardinality of base relation * selectivity of index + overhead of index retrieval.
- The optimizer approximates the cardinality to be equal to the number of data pages for the relation divided by the maximum number of records per page.
- Selectivity is equal to the estimated number of distinct values divided by the cardinality. The cardinality can change over time when new pages are allocated and records are stored, modified, or deleted. The selectivity is only set on index creation. There is a command that will re-compute the selectivity for non-unique indices:
SET STATISTICS INDEX RDB$FOREIGN8;
- From the example, there are two rivers with the same length, rivers three and four. Because they are the same length, we need to compute the cost associated with each river. River three is chosen because doing an index lookup on Job will be the least expensive retrieval as compared to Department. That is because there are only fourteen unique job_codes compared to twenty-four unique departments. Note that the selectivity of a unique index is "perfect", i.e., looking up a value will find only one record with that value. A lookup into Employee is the most expensive because it uses a duplicate index and there are fifty-nine duplicate departments.
- 6. Remove the selected river from the list and repeat until all rivers have been selected.
- 7. Finally, take the set of rivers selected and try to generate sort merge joins between them; otherwise, do a cross product.
- A sort merge means that both streams are sorted and the results merged. With the streams sorted, the database engine can scan through both streams just once to form the join. Otherwise, it must iterate through the second stream for each record in the first stream. A cross product means that every record in the first stream joins with every record in the second stream.
- This should make it very clear that properly designed indices are a major influence on performance. What may not be so clear is that the selectivity of an index is only computed for a duplicate index. If the values in the index are changed, i.e., updated, deleted, or new ones inserted, it is advised to re-compute the selectivity periodically. This would affect only those applications that attach to the database after re-computing. Any existing queries will still be optimized using the old selectivity.
- Database pages allocated for the table also impact the cardinality. If you load a great deal of data into a relation, delete a substantial number of rows, then you may have many pages that are empty or nearly empty. This will significantly affect the computation of the cardinality. The only method to correct this is to backup and restore the database.
- So in design queries, it is very important that the primary and foreign keys have an index defined because these are the fields that tables will be joined across. The program ISQL is an excellent environment for prototyping queries. Make sure that the data returned are correct. Partial Cartesian products are much more difficult to detect then full ones.
To generate the following plan:
PLAN JOIN (E INDEX (RDB$FOREIGN8), J INDEX (RDB$PRIMARY2, MINSALX, MAXSALX, RDB$FOREIGN3), D INDEX (RDB$PRIMARY5))
Refer to SET PLAN to display the optimizer plan computed for a query. You can write a custom PLAN to be included with a SELECT statement, this PLAN will supercede the InterBase optomizer PLAN for the query.