Learn how to use SQL cross joins to accomplish some difficult tasks.
Long ago, before my hair was gray, I
learned about cross joins. Now, maybe it's my failing memory, but I cannot
remember learning any good examples of why I would ever want to do a cross join.
However, strangely enough, I recently came across a couple of great examples of
using cross joins.
First, let's discuss what a cross join, also called a
Cartesian join, is. This type of join connects all records from one file to all
records in another file. So if file A has three records and file B has four
records, the resulting join has 12 records (see Figure 1). The size of the
resulting table grows exponentially with the size of joined tables, leading to
performance issues. This is fine with small files, but as the files grow larger,
the performance penalty grows higher. So be careful which files you apply this
technique to.
Figure 1: Here's a sample of what you get with a cross join.
My favorite example of using a cross join answers a frequent
request: "How can I get totals on the same line with my detail?" (See Figure
2.)
Part On Hand Qty Total Qty
Percent 101 1,000 31,000 3 102 20,000 31,000 65 103 10,000 31,000 32
Figure 2 : This is the result of a Select with totals on the detail
line.
Figure 3 shows the SQL statement used to generate those
results. The statement pulls data from the Item Master file. Column 1 is the
part number, column 2 is the current on-hand quantity in inventory, column 3 is
the total quantity of all items in inventory, and column 4 is each item's
percent of the total inventory.
Select Item, Onhand, Total, Dec(Onhand/Total,3,2)*100 as Percent From ITEMMAST cross join (Select Sum(onhand) as Total from ITEMMAST) AS Totalfile |
|
Figure 3: This is an SQL statement with a cross join to summary
record.
The second file (Totalfile) is not really a file, but rather
a SELECT statement that builds a temporary file object for use with just this
statement. The column named Total is actually the result of a SUM() on the
on-hand quantity column. Figure 4 illustrates how this join works.
Figure 4: Here's how this type of cross join works.
The many
variations on this theme allow programmers to pull column functions--such as
totals, averages, minimums, maximums, standard deviations, and more--into each
detail line. The cross join provides a simple vehicle to bring these column
functions into the detail lines.
A second and more expensive use of the
cross join is to connect every record in a table to every other record in the
same table. Figure 5 shows the basic concept behind this kind of cross join.
Figure 5: Here are the results of a cross join to
self.
Finding a good example for this kind of cross join always
eluded me until a coworker came to me asking for suggestions on using SQL to
solve a problem for a client. The challenge was this: Identify all items whose
part number contains one of a selected list of parts. Figure 6 shows a list of
sample parts. What the client wanted was to select parts 101, 102, and 103 and
then produce a list of other part numbers that contained those part numbers.
Figure 6: These are the sample part numbers.
The SQL statement
in Figure 7 accomplishes this task using a cross join. The file is joined to a
sub-select of itself, including only selected records. The item numbers of all
of these candidate records are evaluated to see if the candidate part numbers
are like the selected part numbers. Finally, all records in which both items are
exactly alike are omitted.
Select A.Item, B.Item From ITEMMAST A cross join (Select Item from ITEMMAST Where Selected = 'Y') AS B Where A.Item Like '%' CONCAT Strip(B.Item) CONCAT '%' AND A.ITEM <> B.ITEM |
|
Figure 7: Here's an SQL statement with a cross join to a
sub-select.
Figure 8 shows the results. The use of LIKE, STRIP, and
CONCAT were critical to performing the comparisons of one item number to
another. This problem provides a great example of a situation in which cross
joining a file to itself or another file is a great solution to a problem.
Figure 8: And here's your final cross join illustration.
Remember that large numbers of records in the joined files can
create serious performance issues. Consider using sub-selects where possible to
reduce the size of the result table.
Kevin Forsythe is the author of the new book SQL for eServer i5 and
iSeries. He has over 18 years of experience working with
the iSeries platform and its predecessors. He has been a member of the DMC team
for the past nine years. Kevin's primary responsibility is providing iSeries
education, but he also provides customers with project management, system
design, analysis, and technical construction. In addition to his technical
skills (RPG IV, CL, OS/400, SQL, FTP, Query, VB, Net.Data), Kevin possesses the
ability to communicate new and complex concepts to his students. He has been the
primary instructor for DMC's iSeries-based AS/Credentials training courses since
1997 and has authored courses such as Advanced ILE, SQL, Embedded SQL,
Operations Navigator, and Intro to WebSphere Studio. An award-winning speaker,
he has spoken at every COMMON Conference since the spring of
2000.
|