Database Physical Properties

From InterBase

Go Up to IBConsole - Performance Guidelines


This section covers the physical properties of a database:

Database Page Size

The database page size determines how much data will be retrieved with one logical access from the database. The values permitted by InterBase are 1Kb, 2Kb, 4Kb and 8Kb. The default is 1Kb. A single logical access from the database may involve multiple physical accesses. For example, on most UNIX systems, the default number of bytes retrieved in a single physical access is 512. For a 1Kb page size, two physical accesses occur for every logical access. There is a tradeoff then between reading/writing the most data versus physical I/O. The proper page size will be determined by your database requirements. Is the database mostly for reading, update intensive, or a combination? Is accessing BLOBS or ARRAYS a priority? What is the record size of your main table, the table that will be accessed most often?

Database page size will also influence the tuning of indices. The section Database Statistics describes performance monitoring using the command line tool GSTAT or the Server Manager GUI. The layout of an index is analyzed. If there are too many levels in the index then increasing the page size will reduce the depth of the index structures and usually improve performance. Another rule of thumb is to try to keep the indices on the main table to three or fewer levels.

Another effect of increasing the database page size is to implicitly increase the InterBase cache.

You can specify a page size when the database is created. For example, the statement,

CREATE DATABASE "employee.gdb" PAGE_SIZE 2048;

creates a single-file database with a page size of 2048 bytes. If you need to change the page size of the database after creation, you can do this by doing a backup and then restore the database with a different page size. Use the Server Manager Database Restore dialog box.

Multi-File Databases

The database can also be made up of many different files. This allows you to effectively use the available disk space on multiple volumes. The user always refers to the first file, the database name, and never has to know about the secondary files. This also allows a database to be larger than the operating system limit for a single file. This does not allow the DBA to specify in which file individual objects in the database may be placed. In other words, you cannot assign Relation A to the first file, Relation B to the second file and the indices for both to the third file. The files are used and filled in sequential order and can contain data from any table or index in the database. In fact, as data grows over time, the pages used for individual tables or indices are likely to become roughly interleaved.

You can add new files to the database without taking the database off-line or interrupting users who are doing work. One reason to do this is if your database is growing and threatens to outgrow the disk volume it resides on. Adding an additional file means that when the primary database file fills up, subsequent data are automatically stored in the secondary file(s). Below is an example of the ISQL command to ad a second file to the database. By doing this, the primary database will top off at 50,000 database pages.

ALTER DATABASE ADD FILE "second_file.gdb" STARTING AT 50001;

If you need to rearrange the distribution of pages in a multi-file database, you can do it by doing a backup and then restore the database, specifying the secondary files and the attributes. Use the Server Manager Multi-File Database Restore dialog box.

Database Shadows

Another physical property is the ability to create shadows of the database. Shadows are carbon copies of the database, an exact duplicate. The main reason for shadows is to protect yourself from hardware failure. First you have to set up one or more shadows on other disk partitions or better still, other machines. If your primary disk or server fails the users can reconnect to the shadow after the DBA has enabled it. This is much quicker than restoring from a backup. Users can be working normally in minutes, rather than hours.

Shadows can also be composed of multiple files just like normal databases. The files comprising a shadow are not required to match the sizes or filenames of the files comprising the master database.

The major drawback to shadows is that they increase the number of writes the database server does. If you have only one shadow then every write is duplicated. If you have two shadows then every write is tripled. If the shadow is located on another machine and the writing is going through NFS, then it takes even longer. There is a tradeoff in this case between I/O performance and data protection.

Database Statistics

The InterBase Server Manager (or the GSTAT console utility) can retrieve database statistics as they relate to the physical layout of the database file(s). The output from the Database Statistics option will look similar to this:

EMPLOYEE (34)
               Primary pointer page: 251, Index root page: 252
               Data pages: 5, data page slots: 5, average fill: 68%
               Fill distribution:
                                0 - 19% = 0
                               20 - 39% = 1
                               40 - 59% = 0
                               60 - 79% = 2
                               80 - 99% = 2

Index NAMEX (1)
               Depth: 1, leaf buckets: 1, nodes: 42
               Average data length: 15.00, total dup: 0, max dup: 0
               Fill distribution:
                                0 - 19% = 0
                               20 - 39% = 0
                               40 - 59% = 0
                               60 - 79% = 0
                               80 - 99% = 1
Index RDB$PRIMARY7 (0)
Depth: 1, leaf buckets: 1, nodes: 42
Average data length: 1.00, total dup: 0, max dup: 0
Fill distribution:
                                0 - 19% = 0
                               20 - 39% = 1
                               40 - 59% = 0
                               60 - 79% = 0
                               80 - 99% = 0

The text is information on tables and indices sorted alphabetically by table name. The most interesting information on a table is the number of data pages and the average fill. The information is usually only relevant for one table. This table should be the main table for the database, one that is read and/or updated constantly. If the average fill is below 60 percent then try backing up and restoring the database. If the average fill is still low then it might be advisable to increase the database page size to the next value.

The information on indices is more complicated. Essentially the only field you do not need to check is leaf buckets. InterBase uses a variant of B-tree indices and the field depth refers to the depth of the tree; the number of levels down from the top bucket. Normally this value should be three or less. If it is greater than three, the indices should be rebuilt. The command to do this in ISQL is:

ALTER INDEX custnamex INACTIVE;
ALTER INDEX custnamex ACTIVE;

If the depth does not decrease then it might be advisable to increase the database page size to the next value.

Note:
The only way to rebuild the indices defined with the Declarative Referential Integrity syntax is to backup and restore the database.

The nodes is the total number of data values in the index. Total dup is the number of duplicate values and max dup is the largest number of duplicates for a single value. Average data length is the total number of compressed bytes for all data values / nodes. The data values are both pre and postfix compressed. Postfix uses run-length encoding. Prefix compression is best explained using an example. Using the index custnamex which is an ascending index on the field customer, a text field of a maximum twenty-five characters. Given the data values of:

AA Auto Rentals
AAA - California
AAB Incorporated
AAB Towing
ABC Limo Service

Postfix compression would compress out the trailing blanks but the data values after prefix compression would look like this:

AA Auto Rentals
2A - California
2B Incorporated
3Towing
1BC Limo Service

If the value of average data length is much different from the maximum size of the field(s) that make up the index then either there are many fields with many trailing blanks, or the data values are very similar. The index may then be decreasing performance because most of the index and data pages will be read instead of subsets of each.

Advance To: