Improve UDF performance with a new flavor of deterministic.
After V5R3 was released, I wrote an article highlighting the use of the NOT FENCED and DETERMINISTIC options to improve the performance of User-Defined Functions (UDFs). Those are still great options to use to boost performance. However, more recently the IBM i 7.2 release extended the possible reach of the Deterministic option with a new flavor known as statement-level deterministic.
My previous article describes how the DETERMINISTIC option can improve UDF performance since it enables the Db2 engine to cache the input and output values from function calls. That cache allows Db2 to skip the overhead of calling the function and instead just return the cached output value. The following UDF that converts a Fahrenheit temperate reading to the Celsius equivalent is a good example of a deterministic function. Anytime the function is called with an input value of 32 degrees, the output value will always be a Celsius degree value of 0. There’s no need for Db2 to call the function when its cache shows that an input value of 32 always produces an output value of 0.
CREATE FUNCTION getCelsius( fahrenheit_temp INT)
RETURN( CEILING( (5*(fahrenheit_temp-32) )/9) );
The original DETERMINISTIC cache had a global scope. This global scope meant that Db2 was able to use the cached function values for any invocation of the function. If a deterministic function was last called a couple of days ago, the cached values could be used by Db2. If a deterministic function was called by a different SQL statement than the current SQL statement, the cached values could be used by Db2.
IBM received feedback through the years from customers and partners that this global scope was too broad. Developers shared that they had UDFs that relied on current information like currency conversion rates that wouldn’t work with a cache based on a global scope. However, these UDFs would benefit from caching scoped to a statement level, meaning only using cached function values that were generated during the execution of the current statement. That feedback led to IBM delivering support for a STATEMENT DETERMINISTIC option as shown in the syntax diagram in Figure 1.
If a flavor of the DETERMINISTIC option is not specified in the function definition, you can see that Db2 defaults to the original global scope behavior. The STATEMENT DETERMINISTIC option has to be explicitly defined in order for Db2 to change the scope of the function caching from a global level to a statement level.
Figure 1: Deterministic option syntax diagram
My previous article used a GetLocalTemp function as an example of a UDF that should be defined as NOT DETERMINISTIC. That UDF took a ZIP code as an input parameter and then used a web service to return the current temperature for that ZIP code. Because the temperature for a ZIP code is going to change over time, it made no sense to allow Db2 to cache the input and output values for the calls to the GetLocalTemp UDF.
The STATEMENT DETERMINISTIC option now opens the door for the GetLocalTemp UDF to benefit from deterministic caching. Instead of allowing cached values to be used across a global scope, the statement flavor restricts the usage of the cached values to the function invocations that occur within the execution of an SQL statement.
Let’s assume the customers table referenced on the following query contains 10 rows, which consist of 5 customers with a ZIP code value of 55901 and 5 customers with a ZIP code value of 51106. Provided that the GetLocalTemp UDF has been created with the STATEMENT DETERMINISTIC option, Db2 would only need to call the UDF two times and would be able to reuse cached output values on the other eight rows. The UDF would be called the first time it processes a row with a ZIP code value of 55901 and the first time it processes a row with a ZIP code value of 51106.
SELECT customer_zip, GetLocalTemp(customer_zip) FROM customers
If this query were to run 1 minute later by the same user or a different user, the UDF invocation behavior would be the same: two calls to the UDF and eight reuses of the cached values. A GLOBAL DETERMINISTIC UDF could have been able to reuse the cached results from the previous run of the query, but this STATEMENT DETERMINISTIC UDF can only use cached results generated during the statement execution.
With information about this new deterministic option in hand, you should make a plan to review your UDFs and see if it can be leveraged.