Conditional masking provides developers with another tool for protecting sensitive data.
In the first article that I wrote on the DB2 for i field procedure (FieldProc) support in the IBM i 7.1 release, I shared that the Field Procedure support delivered at GA should not be used to conditionally returned masked values due to the risk of data loss. Thus, a FieldProc program written for a credit card column on a decode call always needed to return the credit card number ('1100220033004400') instead of a masked version ('************4400') of the credit card number for unauthorized users.
The good news is that Level #14 of the IBM i 7.1 Database Group PTF lifts the masking restriction. This article details the enhancements that were made to allow Field Procedure programs to return masked values on the decode function call.
The new FieldProc Masking support revolves around two main components. The first component is a new parameter that was added to the parameter lists that the DB2 engine passes to the FieldProc program on each decode call. This new parameter controls whether or not the FieldProc program can return a masked value. There are some DB2 operations—such as the RGZPFM (Reorganize Physical File Member) command and trigger processing—that always require the clear-text version of the data to be returned. The second component is a new special SQLState value ('09501') that is to be returned by the FieldProc program whenever it is passed a masked value on the encode call. This prevents the masked value from being encoded, which would result in the original data value being lost. When this special SQLState value is returned, DB2 will ignore the encoded value that is passed back by the FieldProc program and instead use the value that's currently stored in the record image for that column. If this special SQLState is returned while a new row is being added to the table, then the default value for the FieldProc column will be stored in the record image.
Let's see these two new components of the FieldProc masking support in action by reviewing an example Field Procedure program implemented with RPG. Assume that this example Field Procedure program has been added to the cardnumber column in the orders table with the following SQL statement:
ALTER TABLE datalib/orders
ALTER COLUMN cardnumber
SET FIELDPROC pgmlib/maskpgm
At the beginning of the RPG code below , you should first notice the SqFPInfo variable declaration for the new parameter that's passed to FieldProc programs to communicate whether a masked value can be returned on a decode function call. The details of this new parameter can be found in the SQFP include file, which is found in file QSYSINC/QRPGLESRC.
No changes are needed for conditional masking for the one-time registration call (FuncCode=0) to the FieldProc program, which returns the attributes of the encoded value. As in the first article, the encoded value is twice the length of the original value.
The new special SQLState value is found in the program logic for the encode call. Before encoding the clear-text value being passed into the program, the code checks whether the first 4 bytes of the input value contain a masking character; in this case, an asterisk (*) is used as a masking character. When the logic determines that the input value begins with masking characters, the SQLState value of '09501' is returned, which results in DB2 using the current value stored in the record image for the FieldProc column. Returning the SQLState of '09501' prevents loss of the original data value where an unauthorized user using native record-level access reads a record from a DB2 table, which results in the application buffer containing the masked value; the application then later uses that same buffer to update the record. Prior to the new support, the masked value would have been encoded by the FieldProc program. Now, the FieldProc program can ignore the encode call by returning the special SQLState value.
One limitation with the new support is that if for some reason an application programs tries to insert a new row that contains a masked value for the FieldProc column, DB2 will populate the column with its default value instead of the masked value. This event is unlikely to occur, but you should be aware of the potential behavior.
D FuncCode S 5I 0
D p_FuncCode S *
D OptParms DS LikeDs(SQLFOPVD)
D EnCodTyp DS LikeDs(SQLFPD)
D DeCodTyp DS LikeDs(SQLFPD)
D EnCodDta S 512
D DeCodDta S 512
D SqlState S 5
D SqMsgTxt DS LikeDs(SQLFMT)
D* >>New parameter passed for decode calls
D SqFPInfo DS LikeDs(SQLFI)
D i S 10I 0
D En_ary S 1 DIM(512) Based(En_ary_p)
D De_ary S 1 DIM(512) Based(De_ary_p)
D e S 10I 0
D d S 10I 0
D PROC_NAME *PROC
D * Procedure name
D PGM_STATUS *STATUS * Status code
D PRV_STATUS 16 20S 0 * Previous status
D LINE_NUM 21 28 * Src list line num
D ROUTINE *ROUTINE * Routine name
D PARMS *PARMS * Num passed parms
D USER 254 263 * User name
D 358 367
C *Entry Plist
C Parm FuncCode
C Parm OptParms
C Parm DeCodTyp
C Parm DeCodDta
C Parm EnCodTyp
C Parm EnCodDta
C Parm SqlState
C Parm SqMsgTxt
C Parm SqFPInfo
SqlState = '00000' ;
If FuncCode = 8 ; // Return attributes about the Encoded value
// Verify this FieldProc program only being used for
// fixed-length character column
If DeCodTyp.SQLFST <> 452 and DeCodTyp.SQLFST <> 453 ;
// Return error for unsupported data type
SqlState = '38001' ;
// The Encoded value has almost all of the same attributes
// as the decoded value
// Start by making the encoded attributes identical
EnCodTyp = DeCodTyp ;
// Encoded value length is twice that of the decoded value
EnCodTyp.SQLFL = DeCodTyp.SQLFL * 2;
EnCodTyp.SQLFBL = DeCodTyp.SQLFBL * 2;
ElseIf FuncCode = 0 ; // Being called to Encode
// Check if value to be encoded is masked
If %Subst(DeCodDta:1:4) = '****';
// Use special SQLState value to tell DB2 to use
// value already in record, instead of encoding
// the masked value
SqlState = '09501';
// Value was NOT masked, proceed with Encode operation
// set basing pointers for data arrays
En_Ary_p = %Addr(EnCodDta);
De_Ary_p = %Addr(DeCodDta);
// set array indexes and counter
e = 1;
i = 1;
// Encode data by reversing order of card numbers and
// adding increasing number between each card number
For d = DeCodTyp.SQLFL downto 1;
En_Ary(e) = De_ary(d);
e += 1;
En_Ary(e) = %Char(i);
e += 1;
i += 1;
ElseIf FuncCode = 4 ; // Being Called to Decode value
// set basing pointers for data arrays
En_Ary_p = %Addr(EnCodDta);
De_Ary_p = %Addr(DeCodDta);
// set array indexes and counter
d = 1 ;
For e = EnCodTyp.SQLFL-1 By 2 DownTo 1;
De_Ary(d) = En_ary(e);
d += 1;
If SqFPInfo.SQLFNM = '0'; // If DB2 allows masking on this request
// Check if the decoded value must be masked for the current user
// by retrieving current user from RPG Program Status DS
If currentUserfromPSDS <>'QSECOFER' and
%Subst(DeCodDta:1:4) = '****';
Else ; // Invalid function code
SqlState = '38003' ;
The decode function call is where the conditional masking logic is executed. With conditional masking, the FieldProc program will return a masked version of the decoded value for most users, and the full decoded value will be returned for only a small set of authorized users.
The decode code first decodes the encoded value back to its original value. With the original value in hand, the decode logic needs to decide whether the original value or the masked version of the original value can be returned. Notice that before the decode logic checks whether the current user is an authorized user, the decode logic must first verify whether DB2 allows a masked value to be returned on this decode call by checking the value of the SQLFNM field within the new SqFPInfo parameter. A value of 0 indicates that it's OK for the decode logic to return a masked value. DB2 passes a value of 1 for the SQLFNM field for operations such as the RGZPFM command.
Once it's determined that returning a masked value is allowed, the decode logic retrieves the user profile of the current user from the RPG program status data structure. The current user profile is then validated against the list of authorized users; in this example, the authorized user set is limited to the QSECOFR and SECADMIN users. This authorized user verification logic is quite simple, but the FieldProc program is free to use any mechanism that it wishes in order to determine whether the current user is authorized or not.
When a FieldProc program contains conditional masking logic, the FieldProc column cannot be referenced on the Select/Omit criteria of the definition of a Select/Omit Logical File. FieldProc column references are supported on the Select/Omit criteria only when the FieldProc program always returns the original value on the decode call.
A side effect of the conditional masking FieldProc support is that it can cause queries and reports to return different results for different users. When there's a comparison involving a column with a FieldProc, the DB2 query optimizer can choose to compare the encoded or decoded version of the FieldProc column with the search string. Inconsistent query results across different users comes into play when the optimizer decides to compare the decoded version of the FieldProc column with the search string.
Analyzing the runtime behavior of the following Select statement will provide a deeper understanding of this side-effect behavior.
SELECT order_date FROM orders WHERE cardnumber='1101220233034404'
Assume that the query optimizer decides to compare the search string with the decoded version of the cardnumber column. That means that the query optimizer has internally rewritten the WHERE clause into the following syntax:
WHERE Decoded_Value(cardnumber)= '1101220233034404'
If an authorized user (QSECOFR or SECADMIN) runs the Select statement with this implementation, then the FieldProc program will always return the clear-text value of the cardnumber column. However, when a non-authorized user (JSMITH) runs this query, the decoded version of the cardnumber column will be a masked value ('************4404') which will never match the specified search string ('1101220233034404').
The query optimizer's behavior for FieldProc column comparisons is controlled by the FIELDPROC_ENCODED_COMPARISON QAQQINI option. (A list of supported values is found in Figure 1 later in this article.)
By default, this option is set to the *ALLOW_EQUAL value. Thus, the default behavior will be for the SQL Query Engine (SQE) optimizer to try to compare against the encoded version of the search argument. With this behavior, the query optimizer will attempt to implement the comparison with the following logic:
WHERE cardnumber = Encoded_Value('1101220233034404')
This implementation offers better performance because the FieldProc program is called only once to produce an encoded version of the search argument instead of being called to decode every cardnumber value stored in the table. The *ALLOW_EQUAL setting also causes the optimizer to use the encoded version of the FieldProc column when performing Group By and Distinct processing.
The good news is that the *ALLOW_EQUAL setting improves performance with usage of the encoded value, but the bad news is that not all parts of the query implementation can use the encode value. The decoded version of a FieldProc column is used on inequality (>, <=, etc.) comparisons, LIKE predicates, and the processing for the Order By and the Min/Max functions. The following SQL statement represents the optimizer's implementation for a more complex request.
SELECT order_date FROM orders WHERE
WHERE cardnumber = Encoded_Value('1101220233034404') OR
Decoded_Value(cardnumber) LIKE '1101*'
With the optimizer using a mix of encoded and decoded value of a FieldProc column in query implementations, it's easy to see how FieldProc programs with conditional masking support can cause inconsistent query results.
Allow query optimizer to use the encoded value of FieldProc columns in comparisons to optimize performance. This optimization is allowed for Equal and Not Equal comparisons along with GROUP BY & DISTINCT processing.
In addition to optimizations supported by *ALLOW_EQUAL, optimizer can use encoded values with inequality comparisons (>, <=, …), ORDER BY, and MIN/MAX grouping functions.
Optimizations with encoded value are allowed on LIKE predicate in addition to optimizations supported by *ALLOW_RANGE.
Optimizer is not allowed to optimize performance by using encoded value of FieldProc columns.
Figure 1: Supported values for the FIELDPROC_ENCODED_COMPARISON option
As you can see in Figure 1, the values of *ALLOW_RANGE and *ALL do enable the encoded value of a FieldProc to be used in more aspects of the query implementation to improve performance. However, those values may produce unexpected results when the encoded values produce a different ordering than the original data value. For instance, the *ALLOW_RANGE and *ALL values would result in the query optimizer ordering the results for the following Select statement by the encoded value of the cardnumber column.
SELECT order_date, order_amount, cardnumber FROM orders
ORDER BY cardnumber
Most applications expect the query results to be sorted based on the decoded value of the cardnumber column. Figure 2 demonstrates what the results sorted by the encoded value of the cardnumber column could look like. Based on this behavior, it's clear that the value of the FIELDPROC_ENCODED_COMPARISON QAQQINI option should be chosen only after careful planning has occurred. Remember that it is possible to change a QAQQINI option value between query executions. The DB2 for i Performance and Query Optimization Guide contains additional information on using QAQQINI options.
Figure 2: Output from Select statement sorted by encoded version of CARDNUMBER column
*NONE is the only value of the FIELDPROC_ENCODED_COMPARISON QAQQINI option yet to be discussed. The *NONE value is the only value supported by the Classic Query Engine (CQE). With the IBM i 7.1 release, all Query/400 and OPNQRYF (Open Query File) requests are processed by CQE, which means that those two interfaces will always operate with a value of *NONE, regardless of what value has been assigned to the FIELDPROC_ENCODED_COMPARISON QAQQINI option.
Setting the QAQQINI option to *NONE causes the optimizer to always use the decoded version of FieldProc column values. Performance will suffer with this setting because it will cause many calls to the FieldProc program to decode values stored in FieldProc columns. On the positive side, the *NONE behavior is the simplest to understand because all parts of a query implementation use the decoded value.
As mentioned earlier, the *NONE implementation can be confusing when the FieldProc program contains conditional masking logic because the decoded value of a FieldProc column will vary across users. In these cases, a developer may need to have two different versions of a query: one query for authorized users to run and a different query for unauthorized users.
As you can see, the new conditional masking support for Field Procedures has several configuration options and side effects that need to be reviewed before diving into implementation. Despite this complexity, the conditional masking support definitely provides developers with another tool for protecting sensitive data.