RDBMSs have traditionally supported a limited number of built-in functions to perform operations such as computing averages, calculating sums, concatenating character strings, and extracting portions of data from character strings. However, firms may need to extend the range of functions that can be performed with system-supplied data types—especially functions specific to a certain country or industry. For example, functions that compute payroll withholding tax or hotel room tax will never be included with a relational database product.
User-defined functions (UDFs) cope with these new requirements by providing a way to extend and customize SQL to meet individual needs. A UDF is similar to a local procedure or to a function within a high-level language (HLL) program (and can be implemented with these constructs on the AS/400). UDFs, however, can return only a single output value. The advantage offered by database UDFs is that they can be called from just about any SQL data manipulation statement (e.g., SELECT, UPDATE, and DELETE). This integration allows you to call your business functions while processing your business data instead of performing the two tasks separately.
Along with making it simpler to integrate your business processing with your database access, UDFs make it easier to port SQL from other databases to the AS/400. Despite the existence of SQL standards, the built-in functions shipped with a database vary widely from vendor to vendor. These differences are caused by proprietary functions that a vendor may add or, more simply, by vendors using different names for the same underlying function. For instance, one vendor might call its square root function SQRT, and another vendor might call its square root function SQUARE_ROOT. UDFs provide a solution to the portability problem, since, now on the AS/400, you can create your own function that matches the name and functionality of a system function provided by another database product.
UDFs also complement binary large objects (BLOBs) and user-defined data types (UDTs.) UDFs can help define the valid set of operations and behaviors associated with new types. In this respect, UDFs are a bit like the methods of an object-oriented environment. Kevin Vandever’s article “You’ve Reached Extension DB2 UDB” in the October 1999 issue of MC provides an overview of how UDFs can be used with UDTs
and object-oriented capabilities, but this article focuses on the more general aspects of using UDFs to customize and extend SQL.
A sourced UDF duplicates the semantics of an existing function. The existing function can be an operator, a scalar function, or a column function. Sourced UDFs are particularly useful for allowing a UDT to selectively inherit the semantics and operators of its underlying source type. For example, a sourced UDF would come in handy if you defined a money data type based on DECIMAL(11,2) and wanted to be able to use some of the same functions that support the DECIMAL type with your new money type.
Another use for sourced UDFs is the porting of SQL statements from other database products to DB2 UDB for AS/400. Sourced UDFs are useful in cases where you encounter a function that has the same functionality as an AS/400 built-in function but DB2 UDB for AS/400 uses a different name. For example, the square root problem mentioned above could be solved on the AS/400 as follows:
CREATE FUNCTION square_root
The Create Function Statement and Its Options
External and SQL UDFs are the more general-purpose UDF types; they can be used in database porting exercises as well as for customizing and extending SQL to meet your requirements. Whatever the UDF’s purpose, all UDF types must be created with the SQL Create Function statement. Figure 1 contains Create Function syntax that lists the options available to SQL and external UDFs. (A complete syntax diagram for Create Function is in the DB2 UDB for AS/400 SQL Reference.)
Many of these options apply to both SQL and external UDFs and are reviewed here. (Options specific to just external UDFs have been deferred to the section covering that UDF type.) Declaring the input parameter(s) (there can be zero to 90 of them) and return value is straightforward; parameter and return value types use the same SQL data types as those used on the column definitions on the Create Table statement. The LANGUAGE clause is also straightforward. The SPECIFIC clause is optional but useful when function overloading is being done with your UDFs. With function overloading, the UDFs all have the same function name, so the SPECIFIC clause can be used to assign a unique identifier to each of the overloaded functions.
The DETERMINISTIC/NOT DETERMINISTIC option indicates whether or not the function will always return the same result value if called repeatedly with the same input value. The square root function is a good example of a deterministic function; the square root of 4 is always 2. A UDF that generates a random number is a good example of a function that is not deterministic. DB2 UDB may try to improve performance by rewriting the query or using cached results of function calls instead of invoking the function, so this option lets the database know whether or not the function has to be called. DB2 UDB wouldn’t be able to use cached results for a nondeterministic function such as a random- number function, because the result value would be different each time.
The EXTERNAL ACTION/NO EXTERNAL ACTION option also influences whether the database can rewrite the query or use cached results or has to invoke the function. The UDF may be deterministic, but, if it performs other actions besides just returning a value, it still needs to be called. Placing information on a data queue and updating another table are examples of external actions because these actions are completed only if the UDF is invoked.
The CALLED ON NULL INPUT/RETURNS NULL ON NULL INPUT clause controls what happens when your function is invoked with a null value as one of the arguments. If RETURNS NULL ON NULL INPUT is specified, DB2 UDB will never pass a null argument to your UDF. If a null input is detected, DB2 UDB will automatically consider the result of the function null. This option makes UDFs easier to write because the UDF does not need to test for null parameters. Performance is improved because the UDF is not invoked for null arguments.
The DISALLOW PARALLEL/ALLOW PARALLEL clause tells whether or not it’s safe to call the function from an SQL statement being implemented with parallel processing. The usual reason for disallowing parallel execution of a function is that one invocation of the function needs to pass along information to the next invocation via a scratchpad (more on that later). On the AS/400, ALLOW PARALLEL is applicable only when parallel processing has been enabled with the DB2 Symmetric Multiprocessing (SMP) licensed feature.
The FENCED/NOT FENCED option is provided only for compatibility with the DB2 UDB family. It has no affect on DB2 UDB for AS/400 UDFs.
Finally, the STATIC DISPATCH clause must be specified if the function contains any parameters that reference UDTs.
SQL UDFs are the easiest type of UDF to implement because of the simplicity of the parameter passing and because of the BASIC-like language. The simple language used for SQL UDFs reuses many of the same constructs (e.g., FOR and WHILE loops) delivered with the SQL Stored Procedure language offered in V4R2. Like the Stored Procedure language, SQL UDFs have the drawback of requiring the SQL product and C compiler to be installed on the system where the function is being created. Programmers do not have to write or compile C source. The C compiler is required because the SQL processor generates and compiles C source code. Figure 2 shows how easy it can be to put together an SQL UDF. This UDF converts Fahrenheit temperatures to their Celsius equivalents.
The SQL UDF language is rich enough to implement complex business processing and rules, as shown in Figure 3. This CONVERT_CURRENCY function uses error handlers, dynamic SQL, and conditional logic to convert input currency to U.S. dollars. One of the first steps is to make sure that the amount value is positive. This isn’t really a required step, but it serves the purpose of demonstrating how easy it is to use any SQL function (including UDFs) within the function body. The CURRENCY_TYPE input parameter contains the abbreviation of the currency contained in the amount parameter. This CURRENCY_TYPE parameter is then used to select and retrieve the U.S. exchange rate for that currency from the rate table, which is kept current with all of the latest exchange rates. The rate is retrieved with a dynamically prepared SQL statement. If an invalid CURRENCY_TYPE has been passed in by the invoker, the FETCH statement fails with a Row Not Found error that, in turn, causes the NOT FOUND error handler to be invoked. This handler then marks the exchange rate as unknown (i.e., NULL) and lets the function continue processing at the next line.
External UDFs can be a bit more work to create and define, but their variety of languages provides a great degree of flexibility. Being able to use any AS/400 HLL makes it quite easy to reuse existing code written in RPG, COBOL, etc. that implements various business calculations and processes. HLLs also offer richer functionality than does the basic language available for SQL UDFs. In addition, the executable code can be contained in a program or service program. Figure 4 contains the external, C-language version of the Celsius UDF of Figure 2.
As mentioned earlier, several clauses on the Create Function statement apply only to external UDFs. PARAMETER STYLE is a mandatory clause that identifies the conventions used for passing parameters to the external UDF; SQL UDFs have just a single convention. These conventions define how DB2 UDB needs to pass parameters into the UDF and how the output value will be returned to the database. The selected parameter style does not change how the UDF is invoked; the style really defines how the database will internally pass parameters to and from the external UDF. There are four different parameter styles available for external functions, all of which pass arguments by reference (address.)
The GENERAL parameter style is the simplest. Parameters are passed into the external UDF just as they are defined on the Create Function statement. The output value is a little different in that it needs to be passed back on a return operation available in the ILE languages. The UDF in Figure 4 utilizes the GENERAL style.
The GENERAL WITH NULLS style is an extension of the GENERAL style with two additional parameters for handling null-capable input and output parameters. After the input parameters, the next argument is an array of null indicators with one element per input parameter; each element is a 2 byte signed integer. The final argument is a single 2-byte signed integer that represents the null value of the output result. A value of 0 indicates an argument that is not null, and a value of -1 indicates a null value. Both the GENERAL and GENERAL WITH NULLS styles can be used only in conjunction with service programs.
The SQL parameter style is the default parameter style and the one that conforms to SQL standards. The main differences between this style and the previous two styles are that additional parameters are passed by the database and the output value is included in the list of parameters (instead of being specified on the function return value). A further expansion of the SQL parameter style, as shown in the C code in Figure 5, is probably the easiest way to understand it.
In addition to the output value being returned differently, the handling of null indicators for input parameters is slightly different with the SQL style. Instead of an array containing null indicators for all input parameters, there is one null indicator parameter for each input parameter. The SQLstate and DiagnosticMessage parameters are the only database-supplied parameters that can be set by the function; changes to other parameters are ignored. The SQLstate parameter can be used to signal an error or warning to the database or can be used to return an application-defined result state. The UDF can assign a diagnostic string to the DiagnosticMessage parameter when the SQLstate parameter has been set to provide the invoker more diagnostic information. DB2 UDB fills in the function name and specific parameter names with the fully qualified name prior to calling the UDF.
The DB2SQL parameter style uses the same parameter order and types as the SQL parameter style. However, DB2SQL allows the database to pass along additional parameters. If more than one of the following optional parameters are specified, they are passed to the external program in the order in which they are listed:
• SCRATCHPAD/NO SCRATCHPAD—If the UDF is created with this option, DB2 UDB provides that function a working memory area (up to 16 MB) so it can be used to save information from one function invocation to the next. The structure consists of a 4-byte integer containing the length of the scratchpad and the scratchpad memory area, which has a default size of 100 bytes.
• FINAL CALL/NO FINAL CALL—If this option is specified, DB2 sets the associated call-type argument with an integer value that reflects the current invocation environment. The first call to a UDF on the processing of an SQL statement sets the call parameter to -1. The call parameter is set to +1 on the final call of the UDF. The final call is actually an extra call to the UDF where no input arguments are passed. DB2 UDB sets the call parameter value to 0 on the other calls to the UDF. This option is often used in conjunction with the scratchpad to make it easier to do pre- and post-processing on the scratchpad work area.
• DBINFO/NO DBINFO—With this option, the database passes a pointer to a data structure containing general database information such as the name of the current database, the current user ID, and the code page setting. The exact structure definition can be found in the SQL UDF Include file (QSYSINC/H.SQLUDF).
In Figure 6, an external UDF example of the DB2SQL parameter style in action, the NTHBEST function uses the SCRATCHPAD and FINAL CALL options to find the nth- largest value from a set of values. The C code for this function is not shown here, but you can download it at www.midrangecomputing.com/mc.
The NTHBEST function takes two parameters. The first parameter is a value from the set being scanned, and the second parameter is N, which is the rank of the desired value within the set. Each time the NTHBEST function is called, it returns the nth-largest value processed so far. If fewer than n values have been processed, the null value is returned. As it is called for each selected row, the NTHBEST function will first return n null values followed by an increasing series of values, the last value being the nth-best value. The nth- best UDF can be invoked within the maximum function so the calling SQL statement sees only the desired value. This SQL statement, SELECT MAX(nthbest(TotalSales,2)) FROM results1999, would return the second-highest sales figure from 1999.
The first step in the external program is to get the scratchpad work area ready for usage. This setup is done only on the first call to the UDF by checking the call type parameter for -1. The program needs to allocate enough space to continuously keep track of the n largest values. The setup shows how the SQLstate parameter can be used to send errors to the database when a UDF receives input values that it cannot handle.
On every call to the program except the final call, the function receives a new value and places it into its proper place in the scratchpad array (padPtr->bigValues) if it is bigger than one of the current n largest values. Since the final call does not pass a new value, there is no need for the UDF to process the input arguments or return a result. Instead, it can just free the scratchpad memory allocated on the first call to the UDF.
Invocation and Debug Considerations
As you have seen in some of the examples, a UDF can be called or referenced from any place that allows system built-in functions to be specified (the one notable exception being Check Constraint definitions). Just like system built-in functions, a UDF can be passed input in the form of a column name, a constant, a special register, a host variable, or the result of another function to provide maximum flexibility.
Some planning is needed before referencing your UDFs from an SQL statement, especially if you’re not going to qualify the function name. UDFs are objects just like tables, so, if the name isn’t qualified with a library name, DB2 UDB has to search the path to find the object that needs to be processed by the SQL statement. UDF names cannot be explicitly qualified when running under system naming (*SYS) mode, so the path searching takes on added significance. The *SQL naming mode does allow explicitly qualified function references, such as pgmlib.Celsius (TemperatureColumn). In the past, the path was the AS/400 library list, but UDFs have their own path. When running your SQL statements under the *SYS mode, the path still defaults to *LIBL. When running under the *SQL naming mode, DB2 UDB will first look for unqualified functions in QSYS and QSYS2 and then look in a library matching the name of the current user ID (if one exists). The default behavior is about the same, but there are new precompiler options as well as new SQL statements (e.g., SET PATH and SET OPTION) available for overriding the defaults.
Finding the right function gets even more complex in those environments that use function overloading, so make sure that you design and architect carefully. Literal values
(e.g., ‘ABC’) also complicate function resolution because they are treated as VARCHAR data and are not compatible with functions with CHARACTER data parameters. More
detailed information on function paths and resolutions can be found in the AS/400 SQL reference manual and programming guide.
Debugging UDFs can be a little tricky, since DB2 UDB runs UDFs in a secondary system thread. This threaded invocation model prevents debugging of the underlying UDF program from the job that caused the UDF to be called. First, a service job has to be started with the Start Service Job (STRSRVJOB) CL command from another AS/400 session. The Start Debug (STRDBG) command can then be executed on the second AS/400 session specifying the program object used to implement the UDF.
The secondary threads used by DB2 UDB for calling UDFs can hold resources such as locks that can hinder concurrent system activity on shared objects. Thus, you want to create UDFs that are short, quick-running operations to reduce how long resources are held. It’s also not a good idea to have a UDF execute operations on the same database objects referenced by the SQL statement that invoked the UDF in the first place. The use of threads by DB2 UDB also makes it trickier to implement UDFs with non-ILE languages that are not threadsafe. Original Program Model (OPM) languages can be used for UDFs, but it’s highly recommended that you use the ILE languages because of their thread support.
Just the Beginning...
You’ve seen a number of ways in which UDFs can bring richer functionality and more flexibility to SQL-based data processing. UDFs allow new business processing to be added to the database itself.
Furthermore, they provide access to existing business calculations and processes written in external programming languages. UDFs also provide another alternative when you are migrating SQL statements from one database to another. And these are not even all their capabilities; I didn’t even touch on the object-oriented capabilities of UDFs. Hopefully, though, this article provides you with a greater understanding of UDFs as well as enough details to help you start adding UDFs to your AS/400 programming toolbox.
REFERENCES AND RELATED MATERIALS
• DB2 for AS/400 SQL Reference (SC41-5612, CD-ROM QB3AQ903)
• DB2 UDB for AS/400 Object Relational Support, Redbook (SG24-5409)
• “You’ve Reached Extension DB2 UDB,” Kevin Vandever, MC, October 1999
CREATE FUNCTION function name
RETURNS data type
LANGUAGE language id
SPECIFIC specific name
NOT DETERMINISTIC | DETERMINISTIC
CONTAINS SQL | NO SQL | READS SQL DATA | MODIFIES SQL DATA
FENCED | NOT FENCED
CALLED ON NULL INPUT | RETURNS NULL ON NULL INPUT
EXTERNAL ACTION | NO EXTERNAL ACTION
NO SCRATCHPAD | SCRATHPAD integer
NO FINAL CALL | FINAL CALL
DISALLOW PARALLEL | ALLOW PARALLEL
NO DBINFO | DBINFO
function body CREATE FUNCTION Celsius (FahrenheitTemp INTEGER)
Figure 1: Create Function has many options.
RETURNS INTEGER LANGUAGE SQL
NO EXTERNAL ACTION
BEGIN RETURN ((FahrenheitTemp - 32) * 5/9);
END CREATE FUNCTION CONVERT_CURRENCY(AMOUNT DECIMAL, CURRENCY_TYPE CHAR(3))
/* Take the input amount and convert it to US Dollars */
RETURNS DECIMAL(11,2) LANGUAGE SQL
NO EXTERNAL ACTION NOT DETERMINISTIC
DECLARE STMT CHAR(256);
DECLARE EXCHANGE_RATE DECIMAL(5,4);
DECLARE AMOUNTVAR DECIMAL(11,2);
DECLARE C1 CURSOR FOR S1;
/* Setup handler in case the input currency type is not found */
DECLARE CONTINUE HANDLER FOR NOT FOUND
SET RATEVAR = NULL;
/* Make sure the input amount isn't negative */
SET AMOUNTVAR = ABSVAL(AMOUNT);
/* Get the rate */
SET STMT = 'SELECT CURRENCY_RATE FROM RATETABLE WHERE
CURRENCY_CODE = ?';
PREPARE S1 FROM STMT;
OPEN C1 USING CURRENCY_TYPE;
FETCH C1 INTO EXCHANGE_RATE;
IF EXCHANGE_RATE IS NOT NULL THEN
END /* ====================================================== */
/* Farenheit to Celsius conversion function */
/* Compilation steps: */
/* CRTCMOD MODULE(PGMLIB/CELSIUS) SRCFILE(PGMLIB/QCSRC) */
/* CRTSRVPGM SRVPGM(PGMLIB/CELSIUS) EXPORT(*ALL) */
/* SQL to create the function */
/* CREATE FUNCTION Celsius (INTEGER) */
/* RETURNS INTEGER */
/* EXTERNAL NAME 'PGMLIB/CELSIUS(celsius)' */
/* LANGUAGE C */
/* NO SQL */
/* NO EXTERNAL ACTION */
/* PARAMETER STYLE GENERAL */
/* DETERMINISTIC */
/* ALLOW PARALLEL; */
/* ====================================================== */
long celsius(long *fahrenheit)
return ((*farenheit-32) * 5/9);
Figure 2: The SQL UDF language has a simple syntax.
Figure 3: The SQL UDF language is rich enough to create complex business rules.
Figure 4: UDFs can be written in high-level languages.
Figure 5: The parameter list is expanded under the SQL style.
CREATE FUNCTION mylib/nthbest(Integer, Integer)
EXTERNAL NAME 'MYLIB/NTHBEST(fn_nthbest)'
NO EXTERNAL ACTION NOT DETERMINISTIC
PARAMETER STYLE DB2SQL
Figure 6: This example of Create Function illustrates the DB2SQL parameter style.