Introduction to SQL, Part 2

SQL
Typography
  • Smaller Small Medium Big Bigger
  • Default Helvetica Segoe Georgia Times

Editor's Note: This article is an excerpt from Mastering IBM i (MC Press, 2011).

 

In the first article of this series, we began to examine the basic form of an SQL statement to obtain information from a database table or tables:

Select field-list

From file-list

Where conditional-expression

We discussed how to specify the Select field list, and we looked at some examples. In this article, we consider the From file list and the Where clause.

From file-list

For a simple SQL query, file-list consists of one file name. All fields named in field-list must be contained in that file’s record format—or be derived from it if they are alias fields. The examples in the previous article required that we name only EMPPF.

If you need data from two or more files, you must name each contributing file in the file list. The result will be a join select. Just as in a join logical file or a join query in Query for i5/OS, each pair of joined files must specify a join relationship to ensure that records are properly matched. For example, if you wanted name, address, city, state, and zip-code information for employees, you would need to join file EMPPF to file ZIPPF.

Because field names and relationships between files are so important in using SQL, we have reproduced, in Figure 13.10, the Bachman diagram illustrating the employee-project database used in earlier examples in the Mastering IBM i book. Please refer to this diagram to help you understand the SQL examples that follow.

121212BuckFigure 13 10 

Figure 13.10: Bachman Diagram

The Run SQL Scripts interface has some additional features that we should mention. You can enter multiple SQL statements by separating them with a semicolon, adding comments as you see in Figure 13.11.

 121212BuckFigure 13 11

Figure 13.11: Run SQL Scripts Window with Multiple SQL Statements

If you are working and want to save your work to continue later, you can also save the contents of the input pane as an SQL script by selecting Save from the File drop-down menu in the top-left corner of the window.

If you later want to open the Run SQL Scripts interface without opening IBM i Navigator, you should save the SQL script. You can then double-click the script to open it. The Run SQL Scripts interface will open, and you will be prompted for a database connection, your user ID, and your password.

You can use the From clause to specify three types of Join operations: Inner Join (type 1), Left Outer Join (type 2), and Exception Join (type 3). The following three examples show the proper syntax and the results for each of these Join types.

Example 1

Select FIRSTNAME, LASTNAME, ADDR1, CITY, STATE, E.ZIP

From EMPPF E Inner Join ZIPPF Z

On   E.ZIP = Z.ZIP

When you use Inner Join, all records from EMPPF that have a matching record (same zip code) in ZIPPF are displayed (see Figure 13.12).

 121212BuckFigure 13 12

Figure 13.12: Inner Join Results

Example 2

Select FIRSTNAME, LASTNAME, ADDR1, CITY, STATE, E.ZIP

From EMPPF E Left Outer Join ZIPPF Z

On   E.ZIP = Z.ZIP

When you use Left Outer Join, all records from EMPPF will be displayed. If an employee record is unmatched in ZIPPF, default values are displayed for CITY and STATE (see Figure 13.13).

 121212BuckFigure 13 13

Figure 13.13: Outer Join Results

Example 3

Select FIRSTNAME, LASTNAME, ADDR1, CITY, STATE, E.ZIP

From EMPPF E Exception Join ZIPPF Z

On   E.ZIP = Z.ZIP

When you use Exception Join, only EMPPF records that have no matching ZIPPF records are displayed (see Figure 13.14). Because the field list includes the CITY and STATE fields, they will take default values. (Understanding the mechanism of an Exception Join, we normally would not request fields from the right side file.)

 121212BuckFigure 13 14

Figure 13.14: Exception Join Results

Notice that as far as syntax is concerned, the only difference among the three statements is in the join specification keywords: Inner Join, Left Outer Join, and Exception Join.

The From clauses in the preceding examples assign abbreviated names—E and Z—to the EMPPF and ZIPPF files, respectively. These abbreviated names are called correlation names, and they can be used for qualification when fields with the same name are referenced in two or more files. The intended purpose of a correlation name is to correlate, or synchronize, record pointers between records of two files (or copies of the same file) in something called a correlated subquery, an example of which we will look at later. Although we could have used the full file name for qualification (e.g., EMPPF.ZIP), an abbreviated name is easier to deal with even if we are not using correlated subqueries.

If no qualification had been used, SQL would not have known which ZIP field was being referenced and would have displayed the message “Column name ZIP is ambiguous.” The qualifier serves the same purpose as the JREF keyword required for nonunique field names in a join logical file and the file ID field of a Query for i5/OS join.

You can join more than two files in a single Select statement, and you can use more than one type of join. Suppose, for example, that you needed to list all employees with projects in the PRJMBRPF file, showing each project code to which an employee is assigned, and that you wanted to show the project leader and project description for each PRJMBRPF record. This information comes from the PRJPF file. To avoid listing employees with no projects, you would want an Inner Join between EMPPF and PRJMBRPF. However, if you suspected there might not yet be a PRJPF record created for a new project code already assigned in the PRJMBRPF file, you would want to specify a Left Outer Join between PRJMBRPF and PRJPF. The Select statement to accomplish all that would be

Select E.EMPNO, LASTNAME, PM.PRJCD, PRJLDR, DESC

From EMPPF E Inner Join PRJMBRPF PM

On   E.EMPNO = PM.EMPNO

Left Outer Join PRJPF P

On   PM.PRJCD = P.PRJCD

Figure 13.15 shows the output from this Select statement.

 121212BuckFigure 13 15

Figure 13.15: Join of Three Files (EMPPF, PRJMBRPF, and PRJPF)

Notice that the output shows a project Junk for employee Slick. This invalid project code was inserted into PRJMBRPF using Slick’s employee number to test the join operation. Project Leader (PRJLDR) and Description (DESC) are empty for project Junk because no matching PRJPF record was found. This is exactly as we would expect with the Left Outer Join specified between those two files.

Note that in earlier versions of the IBM DB2 database, the keywords Inner Join, Left Outer Join, and Exception Join were not recognized in SQL, and the only type of true join permitted was the Inner Join. A Where clause was used to specify the join relationship, and the output would not have shown the Junk project record because it was unmatched in the project file. The code to accomplish an Inner Join, without the Join keyword, would be

Select E.EMPNO, LASTNAME, PM.PRJCD, PRJLDR, DESC

From EMPPF E, PRJMBRPF PM, PRJPF P

Where E.EMPNO = PM.EMPNO

And   PM.PRJCD = P.PRJCD

Figure 13.16 displays the results of implementing this code.

 121212BuckFigure 13 16

Figure 13.16: Select Three Files Using a Where Clause

In this example, all the files needed to obtain the desired information are listed in the From file list, with their correlation names, if used. Following the file list, one join relationship for each pair of files is coded as a Where condition. It is imperative that one equality relational expression (E.EMPNO = PM.EMPNO) is provided to define the equijoin relationship (a join condition that uses the equal sign, =, as the comparison operator for each pair of files). This expression defines to SQL, by the line connecting the relationship-supporting fields of the related files, the relationship between two files documented in the Bachman diagram.

Although this syntax is still supported, you should avoid it. It is prone to errors because SQL does not check to make sure the join relationship is present. Running an SQL join select without one or more join relationships can result in a huge amount of totally useless output. Always name the join type using the On clause to specify the join relationship. If you forget an On, the syntax checker will catch it.

Where conditional-expression

You use the Where clause to limit the rows selected for the result table; it works in much the same manner as DDS Select/Omit entries used for logical files. In fact, anything you can do with Select/Omit you can certainly do with Where expressions; but the versatility and power of SQL’s Where expression considerably exceeds DDS’s Select/Omit.

Probably the most common Where expression is some kind of relational expression, in the general form

operand1 relational operator operand2

Operands can be field names, constants, string expressions, arithmetic expressions, scalar functions, and special registers, such as CURRENT_DATE.

Relational operators include =, >, <, >=, <=, and <> (not equal). You can combine multiple relational expressions into complex expressions by using the logical operators Not, And, and Or. Regardless of how complex a set of expressions may become, SQL always evaluates the expression to a single True or False result, which determines whether a row is selected for the result table (True) or rejected (False).

We will start with some simple examples of the Where clause. Assume that each of the following eight examples begins with

Select *

From EMPPF

Example 1

List the employees who earn less than $25,000.

... Where SALARY < 25000

Notice that no editing symbols are used in the numeric constant; both $25,000 and 25,000 would be errors.

Example 2

List employees who earn less than $25,000 and are in the Sales Department.

... Where SALARY < 25000

 And DEPT = 'Sales'

The constant 'Sales' must be typed exactly as the value is stored in the database, and in the same case. Neither 'sales' nor 'SALES' would produce any results. Alphanumeric constants must be enclosed in apostrophes (').

Example 3

List all employees who have a zip code in the range of 514000000 to 523000000 (inclusive).

... Where ZIP >= 514000000

   And ZIP <= 523000000;

You cannot imply the subject (operand 1) in the second expression; that is, the expression

ZIP >= 514000000 And <= 523000000

would be an error. However, there is a simpler syntax for a range test such as this one. It uses the SQL Between keyword and would be written as

... Where ZIP Between 514000000 And 523000000

The expression treats the range values as inclusive,

Between Value1 And Value2

so the statement above is exactly equivalent to the And complex expression of Example 3.

Example 4

List all employees born between the months of January and March.

... Where Month(BIRTHDATE) Between 1 And 3

Here, we use the scalar Month date function to extract the month value from the date field BIRTHDATE. Other often-used date functions include Year and Day, which extract the four-digit year and the two-digit day of the month, respectively, from a date field or expression.

Example 5

List all employees born in the month of November, December, January, or February.

... Where Month(BIRTHDATE) = 11

   Or Month(BIRTHDATE) = 12

 Or Month(BIRTHDATE) = 1

    Or Month(BIRTHDATE) = 2

Here, we use several relational expressions, each connected to the next by Or. However, a far simpler approach is to use the In function. The In function reduces the need for multiple Or statements. One way to use an In function works just the same as a VALUES keyword in DDS. You code the values you want to test against as constants, separated by commas, and enclose the whole list in parentheses. So, for our example, we would code

... Where Month(BIRTHDATE) In (11,12,1,2)

Example 6

List all employees who live in Cedar Rapids, Iowa.

You know there are several zip codes for Cedar Rapids, but you are not sure what they are. Of course, you could look them up and code them in a static In function list, as above, but SQL can also look them up. SQL can create a list of Cedar Rapids zip codes for you from the data in file ZIPPF and then compare each employee’s zip code against the list. To accomplish this, we simply code another Select statement as the In function list value:

... Where ZIP In

       (Select ZIP From ZIPPF

          Where CITY = 'Cedar Rapids')

When a Select statement is nested inside an outer Select statement’s Where clause, it is referred to as a subquery.

If we wanted the results to include employees from several different cities, we could use an In function inside the subquery:

... Where ZIP In

       (Select ZIP From ZIPPF

             Where CITY In ('Clarinda’,

                                 'Kalona',

                                 'Amana'))

Note the need for two closing right parentheses to balance the expression.

Example 7

List all employees who are not project leaders.

... Where EMPNO Not In

     (Select PRJLDR From PRJPF)

In this case, although EMPNO and PRJLDR are not the same field, they share the same domain because they are the Social Security numbers of employees. We instruct SQL to create a list of project leaders using the subquery, and then we ask for employees who are not in that list. Using the Not logical operator with a subquery produces the same result as an Exception Join:

Select E.*

From EMPPF E Exception Join PRJPF

   On EMPNO = PRJLDR

By qualifying * to the correlation name E for EMPPF, we can avoid including all the fields from PRJPF in the output. If we had simply coded Select *, all fields from both files would have gone to the output, and all the PRJPF fields would have had default values. Figure 13.17 shows the output of this Select.

 121212BuckFigure 13 17

Figure 13.17: Select Using “E.*”

Example 8

List all employees who have Road or Rd in their address.

... Where ADDR1 Like '%Road%'

     Or ADDR1 Like '%Rd%'

The Like logical operation provides a powerful technique for matching a search string against a substring of a field from each record. Two wildcard characters, the percent sign (%) and the underscore (_), can be used. The % character is replaced in the target string by any number of unknown characters, while the _ character is replaced by exactly one unknown character. For example, LIKE '%at' would return True results for hat, Hat, cat, Rat, or Frat, while LIKE '_at' would return True results for hat, Hat, cat, or Rat, but False results for Frat. Neither would return true results for Fraternity.

In the solution for Example 8, we checked the ADDR1 field but overlooked ADDR2. We could add ADDR2, checking to the Where clause with more Or operators:

... Where ADDR1 Like '%Road%'

    Or ADDR1 Like '%Rd%'

     Or ADDR2 Like '%Road%'

     Or ADDR2 Like '%Rd%'

Or we could combine ADDR1 and ADDR2 using concatenation:

... Where ADDR1 Concat ADDR2 Like '%Road%'

    Or ADDR1 Concat ADDR2 Like '%Rd%'

As an alternative, we could try putting it all into a single simple expression:

... Where ADDR1 Concat ADDR2 Like '%R%d%'

This method seems like a clever way to perform a fairly complex search with a single statement because the middle % could substitute for oa of Road or for the zero characters between the R and d of Rd. In addition, this expression would certainly pull out all addresses containing Road or Rd. However, it could also give us more than we bargained for. For example, Mr. Hunn lives at 2379 Redbud Boulevard. Figure 13.18 shows the SQL statement results of using the single expression.

 121212BuckFigure 13 18

Figure 13.18: Select Using “Where ADDR1 Concat ADDR2 Like '%R%d%'”

In the next article, we will look at some additional SQL capabilities, including the Distinct keyword, the Order By clause, column functions, views, and correlated subqueries.

BLOG COMMENTS POWERED BY DISQUS