In today's high-tech IT world, it's common for shops to share
data on multiple platforms. For those trusted with the task of writing the
interfaces to share the data, there are many annoyances. For example, who hasn't
sent data from one platform to an export file, updated flags in the database to
signal the data has been sent, and then discovered that the export file never
made it to its final destination on the remote system? Worse, even when the data
does make it to the remote system, it's common to encounter an error resulting
in partial data updates, which often makes restarting the entire process a
Fortunately, for those sharing data between Microsoft's SQL Server and the iSeries, there's a splendid aid at your disposal: distributed transactions (DTs). DTs are functionally similar to local database transactions in that they have a beginning boundary, data modification statements, and an ending boundary whereupon the data changes that occurred are either committed or rolled back. However, distributed transactions extend the concept further by allowing data modification statements to occur against databases on multiple platforms.
Think of how moving data between disparate systems would be simplified with distributed transactions:
1. A transaction boundary is created.
2. Data is moved from the source platform to the destination platform.
3. The source platform marks its data as sent.
4. If everything is successful, all of the changes are committed on both systems
5. If there is a failure, all of the changes are rolled back. When the error condition is fixed, the process can easily resume.
Since transactions involve the "all or nothing" concept, the programmer is assured the data is successfully changed on both platforms or on neither. Never again need we fuss over where to pick back up in the multiplatform processing cycle or reset flags to send data again!
The SQL Server documentation gives good a good introduction to DTs and explains how they work. This article covers the basics of performing a DT using SQL Server's Transact SQL (T-SQL).
- iSeries files to participate in a DT must be journaled.
- The Client Access V5R1 ODBC driver must be installed on the SQL Server machine.
- A linked server to the iSeries must be configured.
- The SQL Server Distributed Transaction Coordinator (DTC) must be started.
The first requirement is that the iSeries physical files to be modified must be journaled. Tables created in a schema (library) created by the CREATE SCHEMA statement are automatically journaled. To verify if a physical file is journaled, use the Display File Description (DSPFD) command. If it is not journaled, use the Start Journal Physical File (STRJRNPF) command to start journaling. If you need help with iSeries journaling concepts (journals, receivers, etc.) see chapters 19 and 20 of the Backup and Recovery Guide.
The second requirement involves installing the Client Access ODBC Driver (V5R1 or higher with the latest service pack) on the SQL Server machine. OS/400 has to be at V5R1 or higher as well. (Starting with V5R2, Client Access has been renamed to iSeries Access, but I will refer to it as Client Access here.) In case you're wondering, the Client Access OLE DB provider IBMDA400 does not currently support distributed transaction processing and therefore cannot be used.
Once the Client Access ODBC Driver is installed, configure an ODBC data source to the iSeries under the System Data Source Name (DSN) tab. For this article, I named my DSN "ISERIES" and used the default options.
The third requirement is to configure a linked server (requires SQL Server 7.0 and above--SQL Server 2000 is used here). A linked server definition allows SQL Server to access tables from a remote database as though they were part of its own local database.
To configure the linked server, start the SQL Server Enterprise Manager. Navigate the tree hierarchy and select the server you want to work with. Expand the server and then expand the "Security" node. Right-click on "Linked Servers" and choose "New Linked Server." In the linked server name box, enter ISERIES again, for consistency with the ODBC DSN. This linked server name will be used to refer to iSeries tables when working with T-SQL.
Under server type, choose "Other Data Source" and select the "Microsoft OLE DB Provider for ODBC Drivers" in the provider name combo box. Under "Product Name" enter "DB2 for iSeries." In the "Data Source" box, enter a valid iSeries DSN (if following along with this example, enter ISERIES.) In the "Provider String" box, you may optionally enter any DSN overrides. For example, to make the iSeries library TESTDATA the default library, enter DBQ=TESTDATA, where DBQ is the Client Access ODBC Driver's keyword to override the library list.
Next, you need to establish the security credentials for the linked server. Click on the Security tab of the "Linked Server Properties" window. In this window, SQL Server gives the option to define a login cross-reference to link the credentials of a specific SQL Server user to a specific iSeries user, but for simplicity, I will not use this feature in this example. In the bottom half of the window, there are options for login definitions not specified in the cross-reference list. Choose the "Be made using this security context" option (SQL Server 7.0's option is "They will be mapped to") and enter a valid iSeries user name and password in the boxes below. Whenever SQL Server attempts to talk to the iSeries linked server, it will use the login information specified here. The linked server has now been configured. Click the OK button.
The last step involves starting the Microsoft SQL Server DTC service. The DTC, which can be started from the SQL Server Service Manager utility, is responsible for handling DT processing across multiple database servers.
Accessing Data on a Linked Server
To verify that the linked server is set up correctly,
run a distributed query (DQ). A DQ is a T-SQL query that accesses data on a
linked server. One way to run a DQ is to specify a four-part table name in the
FROM clause of a SELECT. Specifically, for an iSeries-linked server, the
four-part table name is specified as follows:
FROM linked server.RDB name.schema name.table name
For example, if your linked server name is called "ISERIES," your iSeries' relational database (RDB) name is S1024000 (it's usually the same as your system name), your schema (library) is LIVEDATA, and your table is ORDERS, you would enter the following to retrieve the table's data:
This will allow SQL Server to query the ORDERS table on your iSeries as though it were local to SQL Server. Start the Query Analyzer utility, and try it! In fact, using the four-part syntax shown above, you can place an iSeries table in the FROM, JOIN, subquery, or nested select portion of a SELECT statement. The better news is that linked server tables can also participate in UPDATE and DELETE statements (provided the linked server's ODBC or OLE DB drivers are capable, which is the case with the Client Access ODBC driver.)
Another way to run a DQ is to use the OPENQUERY function. OPENQUERY submits a passthrough query to the backend database engine for processing and returns the results as though it were a SQL Server table. OPENQUERY requires two parameters: a linked server name and an SQL statement. The following is an example of how to use OPENQUERY:
FROM OPENQUERY(ISERIES,'Select * From LiveData.Orders')
The main difference between the two examples is that, with the four-part table name syntax, SQL Server queries less efficiently than with OPENQUERY. OPENQUERY avoids much of SQL Server's overhead by submitting a SQL statement directly to the linked server's database engine. To do this, however, the SQL statement supplied to OPENQUERY must conform to the linked server's SQL dialect. In other words, you can't submit a T-SQL statement to an iSeries linked server.
Many DQ performance considerations are beyond the scope of this article. For some of the iSeries-specific performance considerations, see "Running Distributed Queries with SQL/400 and SQL Server 7.0" in the September/October 2000 issue of AS/400 Network Expert. For more information on DQs, see the SQL Server T-SQL documentation on the OPENQUERY, OPENROWSET, and distributed query topics.
Running a Distributed Transaction
Now, we're at the heart of the topic. For this
demonstration, on the SQL Server side, I'll be using the NORTHWINDCS sample
database, which is included with Office XP (you could also use the sample
database called NORTHWIND that comes with Office 2000). I'll focus on a
particular table called Products, which is the Product Master table.
For this example, assume that an identical Products table exists on the iSeries and that these two tables need to be synchronized at five-minute intervals. The structure of the Products tables for each platform is shown in Figure 1.
Figure 1: These are the Products tables from the NORTHWIND database as
they exist within SQL Server and the iSeries. The Synchronized column was added
to both for tracking an item change.
For simplicity, assume that the synchronization will flow in only one direction. The Products table on the SQL Server side is the "master"--that is, changes to the Products table have to be done through a SQL Server application. Further, changes to the iSeries table will only be those resulting from the synchronization process.
To try this scenario, open Query Analyzer, select the NorthwindCS database, and issue the following SQL statement to add a "synchronized" flag to the Products table:
Next, create a schema (library) on your iSeries called NORTHWIND using the "CREATE SCHEMA NORTHWIND" SQL statement. Create the Products table in schema NORTHWIND using the second CREATE TABLE statement shown in Figure 1 (remember to use the appropriate SQL naming convention). This table will be journaled automatically. Finally, copy the Products table data from SQL Server to the iSeries using the distributed query shown in Figure 2.
Figure 2: This distributed query will insert data into the iSeries
Products table from the SQL Server Products table.
Look at Figure 2's INSERT STATEMENT. The four-part table name syntax is specified as the table to receive the data. The SELECT portion consists of the SQL Server Products table with a subselect to the iSeries Products table again, to make sure a duplicate record isn't inserted (of course, all records will be inserted the first time through.)
In the subselect, though, the iSeries Products table is embedded in the OPENQUERY function instead of the four-part table name syntax. In this case, the reason for using OPENQUERY instead of the four-part table name has to do with performance.
Now that the tables are synchronized, subsequent inserts, changes, and deletes to the SQL Server table have to be tracked and moved to the iSeries table. Figure 3 shows a complete T-SQL stored procedure to do this.
Figure 3: This stored procedure will propagate adds, updates, and deletes
from the SQL Server Products table to the iSeries Products
Notice that the XAct_Abort is set to On. This is done to prevent nested transactions, which the iSeries ODBC driver does not allow. By default, SQL Server processes all statements inside a default transaction so that partial rollbacks can occur. Starting another explicit transaction using BEGIN TRANSACTION actually starts a nested transaction, which will cause the CA ODBC driver to error out. Setting XAct_Abort to On turns off the default initial transaction boundary. By implication, this setting will also prevent SQL Server from doing partial rollbacks.
The first code section is a repeat of the code already shown in Figure 2. An INSERT statement is used to move all new records from the SQL Server Products table to the iSeries table.
The second section involves reflecting all changes to the products in the SQL Server table on the iSeries. A cursor is opened against the local Products table to select all products that have changed. Inside the loop, the BEGIN DISTRIBUTED TRANSACTION statement is executed to start a transaction for each item. In this case, each product update will be treated as a single transaction. If your situation requires either all or none of the Product updates to occur, you can specify the BEGIN and COMMIT transaction boundaries outside of the loop.
Inside the loop, an UPDATE is issued against the iSeries table for each field. After the update is completed, SQL Server's SYNCRHONIZED column is set to true to indicate that the two tables are in sync for the given ProductID. After the second update is completed, the transaction is committed or rolled back, depending on whether an error occurred. This is where the power of the DT shines: The SQL Server synchronized flag will not be set to True unless the data is successfully placed on the iSeries.
The third and final section deletes all products from the iSeries table that no longer exist in the SQL Server table. Again, the four-part table name is specified, and an EXISTS clause is used to see if the ProductID on the iSeries still exists in the SQL Server Products table. You probably realized that the INSERT and DELETE statements were not embedded inside of a BEGIN DISTRIBUTED TRANSACTION block. This is because DT processing isn't required here, since data is being updated on only one platform.
Writing that stored procedure was relatively painless--it's hardly different from a procedure written to synchronize two local tables! However, there are still two additional requirements to make the synchronization take place. The first requirement is to set the Synchronized flag to False (0) whenever a product is changed. You can do this through either the application program or an update trigger. The second necessity is to schedule this stored procedure to run at regular intervals using SQL Server Agent or some other scheduling mechanism.
Does It Really Work?
If you're still following along in this example, you
can now see for yourself how this works. Open the NorthwindCS.ADP Client/Server
sample database with Microsoft Access. Go to the database window, choose the
Tables tab, and double-click on the Products table to open it. Delete a few
records, insert a few new records, and change a few records. For the changed
records, set the synchronized flag to False (0). (To delete existing records,
you will have to remove the referential integrity constraint between the Order
Details and Products table.) Issue the CREATE PROCEDURE statement shown in
Figure 3, then execute it as follows:
When you query the data on the iSeries, all of your modifications to the SQL Server table should be reflected.
Figure 4: This T-SQL shows how to use an updateable cursor on the iSeries
within a distributed transaction.
The major difference between this code and the code in Figure 3 (other than the table reversal) is that the transaction boundary has to be placed before the cursor declaration. This means that all of the records will be involved within the transaction boundary. To have an updateable cursor on a linked server, SQL Server requires that the isolation level be set to repeatable read or serializable. These locking levels are restrictive in terms of record locking, so use updateable cursors sparingly.
The one other thing to be aware of is that I had to modify the ODBC DSN with a default commitment control level of *NONE. Without this setting, I would erratically get error messages stating that the required transaction isolation level could not be achieved.
Trials and Tribulations of New Technology
Even though DTs are extremely useful and will continue to grow in popularity, there are still pitfalls. While the end product looks easy enough, it takes quite a bit of fiddling to get everything to work correctly. Listed below are some of the major things I battled with:
Linked Server Errors Cause Processing to Halt
Even though the code shows a tidy Commit and Rollback, the fact is that, when a linked server error occurs, the entire procedure stops with an error severity of 16. As far as I can tell, there is no way to trap these errors. (If someone knows a way around this, please let me and everyone else know by posting a note to the forum associated with this article.) If, for example, a record on the iSeries is locked so that it can't be changed, the procedure will just stop instead of allowing a programmatic response to the condition. This is the worst drawback I encountered.
Be careful when entering four-part table names because the RDB name, schema, and table names should be entered in uppercase. In a few cases, when I used an iSeries side cursor, column names seemed to be case-sensitive as well.
If you need an updateable iSeries side cursor, SQL Server requires that the table have a unique index. If for some reason your base table isn't able to have a unique index, you can use a read-only cursor with individual UPDATE statements to change the data.
Service Pack Levels
- OS/400 V5R1 with Group Database Fix SF99501-04
- Windows XP Professional with Service Pack 1
- SQL Server 2000 (with no service pack and Service Packs 1 and 3)
- Client Access V5R1 SI05361 and SI06804
- iSeries Access V5R2 SI07675, SI06631 (SI05853 didn't work)
Things are a little too fragile for my liking. Unfortunately, it seems that the CA ODBC driver's ability to work with DQs and DTs changes from service pack to service pack. For instance, I had complete success with everything shown in this article using CA V5R2 SI07675. However, SI05853 was a complete flop. The V5R1 SI06804 did everything except for the iSeries-side updateable cursor.
My only reason for sharing this information is that it was frustrating trying to find the right combination of software levels to make the thing work!
Ensuring the Veracity and Timeliness of Shared Data
As the requirements for sharing data between
platforms in real time increases, so will the popularity of DTs. Their ease of
use and ability to guarantee the "all or nothing" concept among multiple
database servers make them an ideal candidate for fulfilling many of the
cross-platform interface requirements.