InterBase Quick Start: Part IV - Joining Tables

From InterBase
Jump to: navigation, search

Go Up to InterBase Quick Start: Part IV - Retrieving Data

Joins enable a SELECT statement to retrieve data from two or more tables in a database. The tables are listed in the FROM clause. The optional ON clause can reduce the number of rows returned, and the WHERE clause can further reduce the number of rows returned.

From the information in a SELECT that describes a join, InterBase builds a table that contains the results of the join operation, the results table, sometimes also called a dynamic or virtual table.

InterBase supports two types of joins: inner joins and outer joins.

Inner joins link rows in tables based on specified join conditions and return only those rows that match the join conditions. If a joined column contains a NULL value for a given row, that row is not included in the results table. Inner joins are the more common type because they restrict the data returned and show a clear relationship between two or more tables.

Outer joins link rows in tables based on specified join conditions but return rows whether they match the join conditions or not. Outer joins are useful for viewing joined rows in the context of rows that do not meet the join conditions.

Correlation Names

Once you begin to query multiple tables, it becomes important to identify unambiguously what table each column is in. The standard syntax for doing this is to state the table name followed by a period and the column name:

 table_name.col_name

In complex queries, this can get very tedious, so InterBase permits you to state a shorter version of the table name in the FROM clause of a join. This short name is called a correlation name or an alias. You will see many examples of correlation names in the next few pages. The form is as follows:

 SELECT a.col, b.col FROM table_1 a, table_2 b 
     ON a.some_col = b.some_col
     WHERE a.conditional_col <condition>
     ...

Notice the FROM clause, where table_1 is given the correlation name of a and table_2 is named b. These abbreviated names are used even in the initial select list.

Important: If you include a subquery in a join, you must assign new correlation names to any tables that appeared in the main query.

Inner Joins

There are three types of inner joins:

  • Equi-joins link rows based on common values or equality relationships in the join columns.
  • Joins that link rows based on comparisons other than equality in the join columns. There is not an officially recognized name for these joins, but for simplicity’s sake they can be categorized as comparative joins, or non-equi-joins.
  • Reflexive or self-joins compare values within a column of a single table.

To specify a SELECT statement as an inner join, list the tables to join in the FROM clause, and list the columns to compare in the ON clause. Use the WHERE clause to restrict which rows are retrieved. The simplified syntax is:

SELECT <columns> | *
     FROM left_table [INNER] JOIN right_table
     ON left_table.col = right_table.col
      [WHERE <searchcondition>]

There are several things to note about this syntax:

  • The INNER keyword is optional because INNER is the default join type. If you want to perform an outer join, you must state the OUTER keyword explicitly.
  • The FROM statement often specifies the correlation names
FROM table1 t1 JOIN table2 t2
  • The operator in the ON clause does not have to be equality. It can be any of the comparison operators, such as !=, >, >=, or <>.

Image 025.jpg Performing Inner Joins

1. Enter the following query to list all department managers and their departments where the manager earns more than 80,000. (This is not stated as dollars because some of the employee salaries are in other currencies.)

 SELECT D.department, D.mngr_no  
        FROM Department D INNER JOIN Employee E   
        ON D.mngr_no = E.emp_no  
        WHERE E.salary > 80000  
       ORDER BY D.department
The result set should look like this:
TableQuery24.png

2. The next query uses a subquery to display all departments and department managers where the manager’s salary is at least 20% of a department’s total salary:

 SELECT D.department, D.mngr_no, E.salary  
      FROM Department D JOIN Employee E   
        ON D.mngr_no = E.emp_no 
       WHERE E.salary*5 >= (SELECT SUM(S.salary) FROM Employee S   
           WHERE D.dept_no = S.dept_no)  
     ORDER BY D.department
The subquery sums the salaries for one department at a time and hands out the result to be compared to the manager’s salary (multiplied by 5).
The result set should look like this:
TableQuery25.png
Note: Joins are not limited to two tables. There is theoretically no limit to how many tables can be joined in one statement, although on the practical level of time and resources, 16 is usually considered the workable maximum.

Outer Joins

Outer joins produce a results table containing columns from every row in one table and a subset of rows from another table. Outer join syntax is very similar to that of inner joins.

SELECT col [, col ] | *  
  FROM left_table {LEFT | RIGHT | FULL} OUTER JOIN 
    right_table ON joincondition 
   [WHERE <searchcondition>]

The joincondition is of the form left_table.col = right_table.col where the equality operator can be replaced by any of the comparison operators.

With outer joins, you must specify the type of join to perform. There are three types:

  • A left outer join retrieves all rows from the left table in a join, and retrieves any rows from the right table that match the search condition specified in the ON clause.
  • A right outer join retrieves all rows from the right table in a join, and retrieves any rows from the left table that match the search condition specified in the ON clause.
  • A full outer join retrieves all rows from both the left and right tables in a join regardless of the search condition specified in the ON clause.

Outer joins are useful for comparing a subset of data in the context of all data from which it is retrieved. For example, when listing the employees that are assigned to projects, it might be interesting to see the employees that are not assigned to projects, too.

Image 025.jpg Practicing with Joins

The following outer join retrieves employee names from the Employee table and project IDs from the Employee_project table. It retrieves all the employee names, because this is a left outer join, and Employee is the left table.

SELECT e.full_name, p.proj_id 
 FROM Employee e LEFT OUTER JOIN Employee_project p 
 ON e.emp_no = p.emp_no 
 ORDER BY p.proj_id
This should produce a list of 48 names. Notice that some employees are not assigned to a project; the proj_id column displays <null> for them.

1. Now reverse the order of the tables and execute the query again.

 SELECT e.full_name, p.proj_id 
 FROM Employee_project p LEFT OUTER JOIN Employee e 
 ON e.emp_no = p.emp_no 
 ORDER BY p.proj_id
This produces a list of 28 names. Notice that you get different results because this time the left table is Employee_project. The left outer join is only required to produce all the rows of the Employee_project table, not all of the Employee table.

2. As a last experiment with joins, repeat the query in Exercise 2 (the one you just did), but this time do a right outer join. Before you execute the query, think about it for a moment. What do you think this query will return?

SELECT e.full_name, p.proj_id 
 FROM Employee_project p RIGHT OUTER JOIN Employee e 
 ON e.emp_no = p.emp_no 
 ORDER BY p.proj_id
You should get the same result set as in Exercise 1. Did you realize that performing a right outer join on tables B JOIN A is the same as a left outer join on tables A JOIN B?

Advance To: