| TechTip: Create an SQL Function to Split a Delimited List |
|
|
|
| Tips & Techniques - SQL | |||||||||||||||
| Written by Michael Sansoterra | |||||||||||||||
| Friday, 22 February 2013 00:00 | |||||||||||||||
|
With the aid of a recursive query, add a SPLIT function (common to many programming languages) to your DB2 for i toolbox.
Many programming languages, such as Visual Basic, C#, Java, JavaScript, and others, have a "split" function. Split allows a developer to take a single string containing a delimited list of items and separate the individual items into a dynamically sized array. Every so often, I find myself longing for a similar function in DB2 for i's SQL. While DB2 for i doesn't have a SPLIT function, it is very easy to create one by embedding the power of recursion within a table function. But instead of splitting the list into an array, the table function will split each item into a separate row within a result set. The technique shown in this tip requires DB2 for i V5R4 or higher.
Before delving into code, let's look at an example of why a SPLIT function is useful within SQL. Say a Web or GUI client is designed to let users inquire against the database based on various filter criteria. Further, say some of the filter selections can allow for multiple values. For example, the quarter selection list shown in Figure 1 will let the user select multiple quarters:
Figure 1: The list box will allow a user to select from zero to many quarters.
When the user selects multiple quarters, how should the client pass those multiple values to the database for processing? In this case, one approach is to let the user pick a maximum of 10 quarters and then create a stored procedure that defines 10 quarter parameters.
A better alternative is to have the client join the selected values in a delimited string (e.g., "2010-Q1,2011-Q1") and pass it as a single parameter to a stored procedure. The stored procedure will use the SPLIT function to break apart the list of values and return results accordingly. The advantage of this solution is that it isn't bound to a specific number of parameterized values.
Here's a little more detail on how the solution would be implemented. Assume that a SPLIT User-Defined Table Function (UDTF) is installed on your system and it accepts two parameters: the data (or delimited string) and the delimiter.
When passed multiple values, the SPLIT table function will convert the delimited list (in this case assuming a pipe character (|) delimiter) to a set of rows.
SELECT * FROM TABLE(QGPL.SPLIT('2010-Q1|2011-Q1|2012-Q1|2013-Q1','|')) LIST;
The result of the above query is shown in Figure 2:
Figure 2: Here's a sample result set from the SPLIT UDTF.
So SPLIT converts a list into rows and columns, which is something SQL is good at manipulating. A sample query that takes advantage of the SPLIT UDTF might look like this:
SELECT * FROM ORDER_HEADER WHERE QTR_ID IN ( SELECT CAST(VALUE AS CHAR(7)) FROM TABLE(SPLIT('2010-Q1|2011-Q1|2012-Q1|2013-Q1','|')) LIST);
Marching on, let's discuss the code involved with the SPLIT function. The DB2 for i code shown below is used to create a User-Defined Table Function (UDTF) called SPLIT. As for the code, I must give credit where credit is due. I filched this idea from a website that specializes in SQL Server tips.
In this tip, the author Mickey Stuewe discussed the very same multi-select problem when using SQL Server Reporting Services (which can be configured to allow users to select multiple values for a report, similar to the scenario described above). Stuewe published a brilliant T-SQL table function to parse a delimited string. I converted the function to DB2 for i (with a few minor changes), which is shown here:
CREATE FUNCTION QGPL.SPLIT ( @Data VARCHAR(32000), @Delimiter VARCHAR(5))
RETURNS TABLE ( ID INT, VALUE VARCHAR(256))
LANGUAGE SQL DISALLOW PARALLEL DETERMINISTIC NOT FENCED
RETURN WITH CTE_Items (ID,StartString,StopString) AS ( SELECT 1 AS ID ,1 AS StartString ,LOCATE(@Delimiter, @Data) AS StopString FROM SYSIBM.SYSDUMMY1 WHERE LENGTH(@Delimiter)>0 AND LENGTH(@Data)>0
ID + 1 ,StopString + LENGTH(@Delimiter) ,LOCATE(@Delimiter, @Data, StopString + LENGTH(@Delimiter)) FROM CTE_Items WHERE StopString > 0 ) SELECT ID, SUBSTRING(@Data,StartString, CASE WHEN StopString=0 THEN LENGTH(@Data) ELSE StopString-StartString END) FROM CTE_Items; There's not much to this function. A recursive common table expression (RCTE) is used to find the starting and ending character positions of all the items in the data string. The delimiter, of course, identifies the end of a data item and the start of the next. The "prime" query in the RCTE identifies the first and last positions of the first data item. The "recursive" portion of the query in the RCTE identifies all the starting and ending positions of the remaining items contained in the list. If you're not familiar with RCTEs, please review the references at the end of this tip.
A few additional notes to consider:
Having a SPLIT function in your database can be useful in a vast number of situations. For instance, filter criteria selections are often combined into a single parameter to avoid the pain of defining multiple parameters of an unknown quantity. Or, in a client/server scenario, multiple data elements may be consolidated so that only a single call is made to the database server from the client (instead of passing small amounts of data using successive database calls), etc. When these types of solutions are architected, SPLIT is a key to allowing the database to handle them. ReferencesPractical SQL: Don't Be Afraid of Recursion TechTip: DB2's CONNECT BY Simplifies Recursive Processing Recursive query optimization (IBM DB2 i 7.1 documentation) | |||||||||||||||
View all articles by this author
|
|||||||||||||||
| Last Updated on Friday, 22 February 2013 00:00 |





You must be logged in to view or make comments on this article