|Practical SQL: OLAP, Part 1|
|Programming - SQL|
|Written by Joe Pluta|
|Wednesday, 03 April 2013 01:00|
Online Analytical Processing (OLAP) adds a variety of processing features to SQL syntax that makes it much more useful as an ad hoc analysis tool, and this article introduces the first of those functions.
SQL has always been a great tool for querying data. The problem is that SQL doesn't always lend itself to the sort of pattern analysis that's needed in enterprise data processing. Something goes awry in the database and the only way to identify the source of the problem is to backtrack through history. Let me give you a real-world example (the data has been changed to protect the innocent, but the situation is pretty common).
Where Did All This Inventory Come From?
The situation is a traditional ERP shop that handles lots and lots of inventory transactions: receipts, issues, transfers, adjustments, you name it. Multiple systems are involved—from purchasing and receiving through manufacturing to warehouse management and shipping. Transactions come from shop floor control, shipping and receiving, and everything in between, including manual adjustments and even EDI.
Modifications are made to the systems over the years to make life a little easier for everyone, and, generally speaking, the system is a great success.
But suddenly one day a problem is noticed when the system's trying to execute an issue. A whole lot of inventory is missing! After an initial mad scramble to identify the problem, it's found that it's not that inventory is missing, it's just that some inventory locations have been over-reported: the database shows more inventory than actually existed. And now the hunt is on to identify the cause.
The good news is that there's an inventory transaction file that keeps historical information. (Even if you don't have a transaction file, as long as your files are journaled, you can extract the changes and then query them, a technique we'llcover another day.) A quick review of the file shows the following:
The transaction file has five primary fields: the item number, the event, the from location, the to location, and the quantity. There's also a unique transaction ID that is incremented by one for every record written to the file. Compare the first two records; these records together form an inventory movement: 15000 units of inventory is adjusted out of location BIN-C-7 and subsequently 15000 units is adjusted into location SCRAP. When the system was originally written, all inventory movement worked this way: data was adjusted out of one location and into another. In fact, at that time the file only had one location, ITLOC. The ITFROM field was added recently as part of a change that allowed a single TRANSFER event to take the place of two adjustments. The QC application (used to scrap inventory) still uses the older two-adjustment method, but the warehouse put-away application has been updated to use the new single-record technique.
Except that there seems to be a problem. Originally, when a truck was unloaded, the system did all the receipts, then did all the adjustments off the DOCK location, and then finally did all the adjustments to the put-away locations. These were three separate processes. The put-away process was modified to use TRANSFER transactions, but evidently under certain circumstances the second adjustment is still being processed. So now we see a TRANSFER that removes inventory from the DOCK1 location and increases the BIN-A-1 location, followed by a second ADJUST transaction, which also increases BIN-A-1. And now we've got double inventory and bad things are going to happen.
It's pretty easy to see the problem by reviewing the data for this item. However, the next part is tougher: how do we use SQL to find all the cases where this has occurred? It would be relatively easy with an RPG program: read through the file by item and transaction ID, set a flag when a TRANSFER is read, and check the next record to see if it's a duplicate adjustment (the same location and quantity). But that concept of "nextness" isn't always easy to come by in SQL. In fact, in this situation it's nearly impossible, because the only thing we can say is that the transaction ID of the adjustment has to be greater than the transfer but with no other records in between. You can do that in SQL, but it's a little bit of a stretch; you probably have to use a JOIN as well as a WHERE NOT EXISTS. I'll leave that exercise to the user.
With OLAP, it's actually relatively simple. Here's the query:
with T1 as (select INVTRN.*,
row_number() over (order by ITITEM, ITTXID) as E_NUMBER
select A.ITITEM, A.ITEVENT, A.ITTO, A.ITQTY, A.ITTXID,
B.ITEVENT, B.ITTO, B.ITQTY, B.ITTXID
from T1 A join T1 B using (ITITEM)
where A.E_NUMBER = B.E_NUMBER - 1
and A.ITEVENT = 'TRANSFER' and B.ITEVENT = 'ADJUST'
and (A.ITLOC, A.ITQTY) = (B.ITLOC, B.ITQTY)
At first glance it may be a little cryptic, but let me break it down. There are really just two simple parts. The first part is a common table expression (CTE) that attaches an OLAP function to the file. The first three lines basically create a CTE named T1, which consists of all the fields from INVTRN as well as the row_number OLAP function. The row_number function simply generates a unique, sequential row number for each record in the query. The syntax of row_number (and this is common among all OLAP functions) allows you to specify the order that the records are looked at when the function is being processed. So, as specified, the field E_NUMBER contains a sequential number starting at 1 for the first record in ITITEM/ITTXID sequence and working its way up. There are many other functions, and variations of the syntax allow you to execute these functions over subsets of the data. I'll go into those issues in more detail in subsequent articles.
After that, the query is pretty straightforward. I join the CTE against itself by item number. This allows me to directly compare all the records for one item to each other. The next line is the money line: I compare only those pairs where the first entry's row number is equal to the second entry's row number minus one. That is, the first entry directly precedes the second entry. This is the part that would have taken a pretty significant SQL statement (as I said, probably requiring a WHERE NOT EXISTS) but instead is now a very simple comparison, which you can use as a template for any similar requests. The next two lines then apply the business logic: a TRANSFER followed by an ADJUST with the same location and amount.
If you run the query against just the records above, you get this:
ITEMA TRANSFER BIN-A-1 10,000.00 102029303231 ADJUST BIN-A-1 10,000.00 102029434501
And that's exactly what we expect. More importantly, it would work no matter how many records were there, as long as there were no records for that item between the two, and it would also show any other record pairs where this occurred.
I'll follow this article up with some more examples introducing more of the OLAP functions and their variations.
P.S. If you run this on a machine at V5R4, the first field in the select must be qualified; that is, it must be A.ITITEM. However, in subsequent releases the ANSI standard of not qualifying fields in a USING clause applies, so you would instead only select ITITEM (all the other fields must remain qualified). Just a word to the wise. Have fun!
|Last Updated on Tuesday, 02 April 2013 13:24|