Many iSeries shops host additional database servers, including Oracle, SQL Server, and MySQL. A common solution to integrating heterogeneous data consists of nightly data transfers and batch processing routines. However, it's often necessary to have real-time access to multiple data sources to provide immediate answers to immediate questions. And waiting for nightly processing just won't do.
In V5R2, this query is possible (albeit with a slightly different syntax) through an SQL routine called a user-defined table function (UDTF). UDTFs are "pseudo tables" that act like normal tables as far as SQL is concerned (except they're read-only). Rather than retrieve data from a physical file, a UDTF gets its data from a program.
UDTFs written in the SQL Procedure Language (SPL) are generally used to access data from the local database. UDTFs written in languages such as RPG or Java are called "external" and can be written to retrieve any kind of data--including database data, APIs, or external data--from any source, depending on the capabilities of the language being used.
(If you lack Java knowledge, don't bail yet. The Java skills required are minimal. All you need is an understanding of Java syntax and JDBC. The iSeries requires the Developer Kit for Java 5722-JV1 and Toolbox for Java 5722-JC1 products to be installed.)
The Java Connection
Using standard iSeries tools, how can programmers make the iSeries query another database? The answer is with Java Database Connectivity (JDBC). JDBC provides a standardized way for Java programs to communicate with a variety of databases. Each database vendor supplies a JDBC "driver" to handle the technical issues of allowing a connection to its database. If you're using Oracle, you'll need Oracle's JDBC driver. If you're using MySQL, you'll need a MySQL JDBC driver. And so forth. Here's a list of vendors that provide JDBC drivers.
In this article, I'll demonstrate a real-time connection from the iSeries to a SQL Server table. This will require the SQL Server 2000 JDBC driver, which you can download from Microsoft's Web site. After downloading the file, install it on a Windows machine so that you'll have access to the help. Copy the three required JAR files (msbase.jar, msutil.jar, and mssqlserver.jar) to your Java development folder (e.g., /home/dev) on the iSeries' IFS. Don't forget; you must register these JAR files for use with DB2/400. (For more information on registering JAR files, see the first sidebar at the end of this article.)
The SQL Wrapper
While Java has long afforded the ability to connect to external databases, it isn't always easy to integrate Java with legacy applications written in RPG, C, or COBOL. In contrast, SQL provides a number of interfaces for providing data to these languages, including embedded SQL, VIEWs, and Query Management queries. A UDTF puts an SQL face on the Java program. For this demo, I'll retrieve data from the Orders table in the Northwind sample database that comes with SQL Server. Figure 1 shows this table definition.
CREATE TABLE Orders (
Figure 1: This is the Orders table from the Northwind database.
To query this remote table with SQL/400, a table function needs to be defined using the CREATE FUNCTION statement. Figure 2 shows the statement needed to define a UDTF to retrieve data from the SQL Server Orders table.
CREATE FUNCTION PgmLib/Ext_Orders(parmWhere VARCHAR(512))
RETURNS TABLE (OrderID INT,
EXTERNAL NAME 'ExtDB.orders'
PARAMETER STYLE DB2General
RETURNS NULL ON NULL INPUT
Figure 2: The CREATE FUNCTION statement defines the columns returned by a table function. A table function is distinguished from a scalar function by the presence of the RETURNS TABLE phrase.
This statement registers a table function called Ext_Orders (external orders). When the function is used, a Java class is instantiated to connect to the SQL Server database and return the data row by row. Here's an example of how to use the table function:
FROM TABLE(Ext_Orders('EmployeeID=5')) AS Orders
- Table functions and their parameter lists must be enclosed by the TABLE() keyword and given a correlation name (as indicated by the AS keyword).
- UDTFs can be defined to accept multiple parameters. To maximize versatility and retrieval speed, the Ext_Orders function is passed a criteria parameter to limit the number of records. Passing an empty string ('') causes the entire remote table to be returned.
- When the function runs, the columns returned will match those in the RETURNS TABLE segment of the CREATE FUNCTION statement.
One other thing to note when creating Java functions is that a small service program with the same name as the function name will be created. Do not delete this program; use the DROP FUNCTION statement instead.
If you're unfamiliar with SQL functions, see the second sidebar at the end of this article for a rundown of the keywords specified in Figure 2.
Writing an External UDTF: The Java Example
Regardless of the language, UDTFs are easy to write-- especially for Java because the class to be written inherits functionality from an existing class called UDF (which is supplied by IBM in package com.ibm.db2.app). Don't be confused by the UDF/UDTF terminology. UDF refers to a user-defined function, which can be a scalar or table function, but UDTF refers only to a table function. The UDF class is used to write both scalar and table functions.
The downloadableExtDB.java program contains the code for table function Ext_Orders. The class and method combination (ExtDB.orders) is referenced in the EXTERNAL NAME portion of the CREATE FUNCTION command, as shown in Figure 2. Method orders() is defined with a return type of void and is the method that will be invoked continuously by the database manager.
Here's how a UDTF program interacts with SQL:
- First, the database manager (dbm) calls the external program once as an "OPEN" phase to give the program a chance to do initialization work. In the sample, a connection is established to SQL Server, and a ResultSet is opened against the Orders table. No data is returned yet.
- Next, the dbm calls the program in a "FETCH" phase, at which time it expects to be fed back a single row of data via output parameters. This FETCH phase repeats until the program signals it has no more data to return. This end-of-table condition is indicated by setting the SQL State to "02000" (no data). In the sample, the Orders table ResultSet is read one row at a time. This row data is returned to the dbm. When the ResultSet has no more rows, the SQL State is set to "02000."
- The dbm now calls the program again for a "CLOSE" phase to let the program perform any cleanup work. In the sample, the SQL Server Connection and ResultSet objects are closed.
- The dbm indicates to the program which phase it is in by passing it a "call type" parameter. It is the program's responsibility to properly respond to this call type indicator.
- Optionally, if the FINAL CALL keyword is specified on the CREATE FUNCTION, two additional call types are defined: FIRST and FINAL, which come before the OPEN and after the CLOSE, respectively. These phases can be used for additional initialization and cleanup work. In the sample, the FIRST call is used to verify that the SQL Server JDBC driver is available.
Figure 3 shows Java code that matches the logic outlined above. Method getCallType() is used to determine which stage the function is in. Method getCallType() and constants SQLUDF_TF_FIRST, SQLUDF_TF_OPEN, etc. are inherited from class UDF. For a complete list of variables and methods in superclass UDF, see page 213 of the IBM Developer Kit for Java.
// Invoked once, one time processing goes here
// Load SQL Server 2000 JDBC Driver
// Invoked once, processing for the open
// goes here. Establish connection to
// SQL Server and open ResultSet.
// Set multiple times, this flag signals
// that the columns for a single row
// should be returned to SQL. When
// there is no more data, set SQL
// state to "02000" (No Data)
// The close is called when SQL State "02000"
// is received from the FETCH. Cleanup is
// done here. The connection, statement
// and ResultSet is closed
// FINAL is specified after the close has
// completed. No additional processing is
Figure 3: This is the shell of the orders method that will be repeatedly called by the database manager.
The parameter list for the orders() method contains a list of the function's input parameters followed by the function's output parameters. The inputs are parameters defined in CREATE FUNCTION, and the outputs are the columns to be returned as defined in the RETURNS TABLE section of Figure 2. The method's parameter list must match the table function's parameters in number and data type. For tips on how to match data types between SQL and Java, see "Parameter passing conventions for Java stored procedures and UDFs" in the IBM Developer Kit for Java. The sample function has one input parameter (the WHERE criteria) and fourteen output parameters representing the columns the table function returns.
When the call type is OPEN, a connection is attempted to SQL Server. Replace your server's IP address, port (default is 1433), user, and password in the connection string. For a list of supported connection string options, see "Connection String Properties" in the SQL Server JDBC driver's help. The SELECT statement used to retrieve data from SQL Server is specified in the ResultSet object's executeQuery method. If an error is encountered, the SQL State is changed to indicate an error condition.
When the call type signals a FETCH, a row of data is to be returned. A row is read from the SQL Server ResultSet (rs). The set() method (inherited from UDF) is called for each column to be returned by the function. The parameters for set() are parameter number (starting with the number of the first output parameter in the method's signature) and data value. If set() is not called for a given parameter, then the corresponding table column will be NULL. This is useful for handling primitive types (int, short, etc.) that cannot contain NULLs. For example, notice that set() is not called for column ShipVia (type integer) if it is NULL.
Note that a String object was used to store dates rather than java.sql.Date because IBM's UDF class doesn't have a set() signature capable of accepting a Date type. The second parameter of set() only accepts the following types: short, int, long, double, float, BigDecimal, String, Blob, and Clob.
The orders() method is invoked continually with the fetch call type until the SQL State is set to "02000" (no data). Neglecting to set the end-of-table condition will result in an infinite loop. Method setSQLstate() is used to set the SQL State value.
When end-of-table is signaled, method orders() is called again with the CLOSE call type. At this point, the Connection, Statement, and ResultSet variables are closed. When FINAL CALL is specified on the CREATE FUNCTION statement, the method will be called one last time with a value of FINAL.
When an error occurs in a UDTF program, the SQL State should be set to indicate a user-defined error that will cause the invoking SQL statement to terminate. SQL State is a CHAR(5) field that communicates the status of the UDTF to the database manager. There are many predefined SQL States you should avoid. For safety, Java user-defined SQL error states may start with '38I' followed by an arbitrary two-digit code. Method setSQLmessage provides extra information to describe the problem.
If the UDTF signals an error during the OPEN call, the dbm will terminate the current SQL statement. In this case, the subsequent FETCH and CLOSE phases will not be called. Therefore, if an error occurs during the OPEN, make sure that proper cleanup is done. In contrast, if an error occurs during the FETCH, the database manager will call the program again with the CLOSE call type.
Compiling the Java Program
Once the Java program is written, enter QShell (QSH) and use the javac (compile Java program) command to compile the Java program. The resulting .class file must be placed in a special folder called /qibm/userdata/os400/sqllib/function. When invoking an SQL routine (function or stored procedure) written in Java, the iSeries database manager expects the Java class to be in this folder.
Here's a sample javac command that uses the -d (directory) option to specify the location of the newly compiled class.
javac -d /qibm/userdata/os400/sqllib/function /home/dev/ExtDB.java
When testing UDTFs, you may need to sign off and sign back on after re-creating a class file to make the JVM uses the newest version.
Using the Function
Here are examples of how to retrieve data from the Ext_Orders function (which will actually get its data from SQL Server):
/* Select orders shipped late */
/* Select Orders for specific employee */
/* Return all rows from table */
Further, you can use table functions anywhere that a normal table can appear (FROM, JOIN, VIEWs, subqueries, and table expressions):
Example 1: List All Unshipped Orders by Company
Use this type of query to combine multiple data sources into a single query. This will allow the iSeries to easily create reports from multiple sources without messy data transfers.
FROM TABLE(Ext_Orders('ShippedDate IS NULL')) LateOrders
JOIN Customers ON Customers.CustomerID=
ORDER BY CompanyName, RequiredDate DESC
Example 2: Place Data in a View for Use by Query/400 or an RPG Program
This view uses SQL Server's GetDate() function to retrieve the current date.
CREATE VIEW DATALIB/DAILYORDERS AS (
FROM TABLE(Ext_Orders('OrderDate=GetDate()')) AS Current)
Example 3: Create a Work Table for a Quarterly Subset of Data
CREATE TABLE QTEMP/EMPSALES AS (
'EmployeeID=8 AND OrderDate
BETWEEN ''01/01/1997'' AND ''03/31/1997'''))
ORDER BY OrderDate)
Connect to Anywhere in Real Time
Java's SQL abilities allow the iSeries to immediately hook to any database that has a JDBC driver. Wrapping the Java program in an SQL UDTF wrapper allows the data to be returned by SQL as though it were a local iSeries table. iSeries reports and inquires from external databases just became easier to write!
To see examples of querying from the "other direction," with SQL Server querying the AS/400 in real time, see "Patch Those Leaky Interfaces" and "Running Distributed Queries with SQL/400 and SQL Server 7.0" in the September/October 2000 issue of AS/400 Network Expert.
Sidebar: Registering the External JAR Files for SQL
For SQL to use a Java program that accesses JAR files (such as the SQL Server JDBC driver), SQL needs to know where the JAR files are. Under normal circumstances, the classpath environment variable indicates where to find the JARs. But things are different for SQL routines because the JARs must be registered using a stored procedure called install_jar. (If you have individual class files to use, these can be placed in the /qibm/userdata/os400/sqllib/function folder.)
Here is how to register the SQL Server JDBC driver's JAR files:
Three parameters are required :
- jar-url--This is the path and name of the JAR file to register ('file:' is the only supported URL scheme).
- jar-id--This is the optional schema (library) and JAR identifier (usually the same as the JAR file name).
- Deploy option--This option currently accepts only a zero (0) on the iSeries.
You need to note a few things about the registration:
- The install_jar procedure only appears to work in SQL sessions using the *SQL naming convention.
- Once the registration is completed successfully, the JAR information is stored in the QSYS2/SYSJAROBJECTS and QSYS2/SYSJARCONTENTS system tables.
- When using STRSQL, I had to issue a COMMIT in order to save the changes made by Install_Jar. I'm not sure if this is a bug or a feature. If you end the session without issuing a commit, the definitions do not appear in the system tables.
- When registering the JAR files, install_jar makes a copy of each JAR in a subfolder under /qibm/userdata/os400/sqllib/function. The subfolder will be called JAR/schema. In the examples above, the JAR files would be found in a subfolder called JAR/PRODLIB. If a schema isn't specified during the registration, the user name will be used, which is the default schema when using the *SQL naming convention. Replace "prodlib" with your own library name.
- Do not manually update or delete these JAR file copies. Instead, use the sqlj.remove_jar and sqlj.replace_jar stored procedures.
Sidebar: SQL Functions
Create Function Ext_Orders
CREATE FUNCTION names the UDTF and registers it with SQL. This function is called Ext_Orders (external orders). This name will be referenced by SQL statements using the function.
You may define an optional comma-delimited parameter list. For table functions, parameter lists are usually passed to a) limit the number of rows returned or b) assist in the calculation of column values.
This keyword defines the column list (column names and attributes) that the table function will return.
External Name 'ExtDB.orders'
This option is applicable only to functions written in a language other than SQL. The external name is a program name that will be called when the function is invoked. For Java programs, the external name is specified as Class.method. For non-Java programs, the name is specified as library/program.
This keyword specifies that the function is written in Java. Other options include SQL, COBOL, CL, RPGLE, C, etc.
Parameter Style DB2General
The parameter style keyword determines how the database manager will pass parameters to an external program. Research these options thoroughly before writing programs.
For Java, the DB2General style is a DB2-specific option that specifies that the table function will extend the com.ibm.db2.app.UDF class.
Another parameter style called "Java" is used when writing routines conforming to the SQLJ Part 1: SQL Routines standard. However, this standard only supports scalar (not table) functions, so DB2General must be used.
The fenced option specifies that multiple invocations of the function (i.e., the function is used more than once in a single statement) will run in separate threads so as to avoid potential conflict with each other.
This parameter, which is mainly for optimization, specifies that there are no DB2 SQL statements to be run by the function.
This option is required for UDTFs that cannot be multi-threaded.
This option specifies that a static storage area will be allocated to the function. This storage area will be preserved between calls to the function. If a number isn't specified, 100 bytes will be allocated.
This keyword is special for Java UDTFs because if it is not specified, Java will instantiate a new instance of the class each time the function is invoked.
This option specifies that the FIRST and FINAL call types will be invoked.
Returns Null on Null Input
If one or more input parameters passed to the table function are NULL, the function will return zero rows.