Remote Journaling and Data Recovery

DB2
Typography
  • Smaller Small Medium Big Bigger
  • Default Helvetica Segoe Georgia Times

Data recovery is an important factor in any disaster recovery plan. One of the main challenges in a disaster recovery operation is getting a copy of the latest data on the target system.

Traditionally, this has been the domain of a High Availability (HA) provider, who would use a journal-scraping technique to capture the changes as they happen and then copy those changes to a remote system before applying them in close to real-time.

With the introduction of remote journaling, some vendors built a similar process, but they used the Remote Journal object to do the scrape against, which removed the need to have a transport layer for the data to be transmitted between the systems. Again, near real-time replication is achieved. Some customers, however, cannot afford to implement these solutions, so they have turn to another solution, such as Hot Site or Mobile recovery, which are standard offerings from HA providers. These solutions allow them to recover, but there's an extended time period before they can be up and running.

More and more companies are realizing that they cannot afford to be without their iSeries for more than a few hours without the risk of losing their business altogether. The amount of data that is pumped into databases is growing rapidly, and the time required to rebuild the system following a loss is increasing. Add to this the time and effort required to retrieve lost data and to keep input of new data flowing. Catching up can become almost impossible.

If only you could input and apply the data in real-time! With remote journaling, the data is available, but you can't use the Apply/Remove Journal Change (APY/RMVJRNCHG) commands against a Remote Journal object because the Remote Journal objects reside on the source system, which can't be accessed. So how do you proceed?

First, you need a copy of the objects that the remote journal changes can be applied to. The save from last night--or a set of incremental saves if you're using Save Changed Objects (SAVCHGOBJ)--is a good starting point. Remote journaling needs to have been implemented for the required objects, and the receivers must still be online. A local journal environment that is a mirror of the source system local journal has to exist so that when the restore is carried out, the files will automatically attach themselves to the journal. Then, you must get the data that was deposited in the remote receivers since the last save and apply it to the restored objects.

This is the tricky part! You have to fool the system into thinking that the data in the receivers is relevant to the objects so that it can be applied. Remember, the Remote Journal object doesn't know that those objects exist on this system; it only knows they were attached to a local journal on the source system.

I set out to prove that it's possible to fool the system. The receivers have no affiliation with the objects; they only hold the data that the journal has captured, and the journal has the affiliation with the objects. So all I needed to do was copy the receivers from the Remote Journal object to the Local Journal object. Once the receivers were attached to the local journal, the APY/RMVJRNCHG commands worked perfectly!

Where This Could Be Used

Now, you have all the information you need to develop a whole plethora of recovery options. Because you know the data can be updated using the RMV/APYJRNCHG commands, you can provide options that were not available previously. There are obviously lots more options than listed here, but here are a few to start with.

Your Own Resources

Suppose you have a system that could be used for recovery, but you don't have the budget for an HA product and its implementation. All you have to do is create the required objects on the target system and maintain those objects using save and restore operations. Remote journaling will keep your data changes since the last save. If you need to recover, all you have to do is replay the changes against the database using the APYJRNCHG command and clean up any object changes. The recovery time will not match that of an HA product, but, depending on the time of the failure, it could be very acceptable. If the failure occurs before any changes have been created, you will be in the same position you would be with an HA product.

http://www.mcpressonline.com/articles/images/2002/Remote%20Journaling%20and%20Data%20Recovery_2%20(V4)00.png

Hot Site Provider

Hot site providers can target new offerings in which remote journaling is used to store real-time updates from source systems. A true copy of your system is maintained using the daily saves, and in the event of a failure, you can replay the information against the database using the APYJRNCHG command and clean up any object changes. LPAR is a major contributor to this solution. Recovery time will be extended only because of the time it takes to get to the hot site and start the process. The amount of data loss is reduced too. Previously, all you had to work with was the last save; you couldn't apply any changes as you didn't have a copy.

http://www.mcpressonline.com/articles/images/2002/Remote%20Journaling%20and%20Data%20Recovery_2%20(V4)01.png

 

Remote Data Vault

A remote data vault is a type of hot site in which only your remote journals are stored. In the event of a system loss, a new system is rebuilt using your saves, and then you replay the information that was captured in the remote journals against the database using the APYJRNCHG command and clean up any object changes. The vendor stores only the Remote Journal objects and data for you. With this solution, management processes must be installed so that when a save operation completes, the receivers are deleted, because the save then holds the same information. Recovery is extended because of the time it takes to create the base system; however, data loss is minimal.

http://www.mcpressonline.com/articles/images/2002/Remote%20Journaling%20and%20Data%20Recovery_2%20(V4)02.png

Application Recovery

Replication using an HA product will result in near real-time replication. Should a failure occur, the data on the target system could be in a position that is too far forward--that is, data has been applied that needs to be removed to allow jobs to be resubmitted. Using the remote journal information and the Remove Journal Change (RMVJRNCHG) command, this data can now be easily removed to a start point compatible with a job restart. The Job Information in the journal relates to the job on the source system, so as long as you know which jobs were open at the time of the failure, you can use this information to remove the relevant changes. (Note: The RMVJRNCHG command will remove all entries to the open job entry. This means that any entries that were added to the same object list by other jobs after the job start and up to the job end will also be removed.)

http://www.mcpressonline.com/articles/images/2002/Remote%20Journaling%20and%20Data%20Recovery_2%20(V4)03.png

A Picture's Worth a Thousand Words!

Below is a pictorial view of what you set out to achieve. The yellow items relate to what is going on constantly during normal periods. The blue items reflect the fact that the journal existed on the target system, but it was in a static status because no updates were being applied as a result of the updates on the source system. The red items show the process that was followed to update the objects using the remote journal receivers. I could have expanded the picture to show a daily save and restore of the database object, but that's for another day!

http://www.mcpressonline.com/articles/images/2002/Remote%20Journaling%20and%20Data%20Recovery_2%20(V4)04.png
How I Did It

To test the new functionality of remote journaling and identify the process that has to be followed to allow the APY/RMVJRNCHG commands to be run against remote journal receivers, I set up a test environment. The test environment was very simple, as it only had to prove the concept. The next stage will be to test the additional capabilities of remote journaling, such as data area support, data queue support, and IFS object support. As commitment control is generally self-healing, I decided not to test the functionality; the ability to force an abrupt end of the system while commitment control is active would be too complex a task. Forcing the system to end abruptly may also cause additional damage to the system. My test was only to confirm the ability to use the information stored in a remote journal to update the target database.

I used two systems: a 170 and a 720, both running V5R1. The systems were connected on a LAN using Ethernet and TCP/IP.

170
DTALIB--Contained all of the files to be replicated
Files--FILEA, FILEB, FILEC
JRNLIB--Contained the journal environment
Journal--TSTJRN
Receiver--RCV0000001
Message Queue--JRNMSGQ (could have been QSYSOPR)

720
Same objects as on the 170 system plus...
RMTJRN--Contained the remote journal environment
SAVLIB--Contained the save files used for the save and restore process
Save File--TEMP

A few notes: 1) Only the library has to exist before the Add Remote Journal (ADDRMTJRN) command is issued. Then, when ADDRMTJRN is issued, the journal is created in the remote journal library. 2) I chose to have a different library on the target system to allow the recovery method to work. 3) The journal receiver is created when the remote journal is activated. 4) Remember to make sure *BOTH is defined for the images to be captured.

Setting Up the Test Environment

First, create the required libraries on both systems. Before you can set up the environments, you have to create a number of libraries to segregate the relevant objects. I separated the data objects and the journal objects because, when you save a library that has a journal object in it and then you restore that library to a system where it hasn't existed before, the OS will automatically attach a receiver to the journal object when it is restored. When the receiver is restored, it is restored as a partial receiver. This is because the receiver was attached at the time it was saved, but the restore process restores the objects in alphabetical order--JRN comes before JRNRCV--and a journal must have a receiver attached to exist. While this situation doesn't affect the test, I felt it was prudent. The remote journal must exist in a separate library to allow the test to work as described. The SAVLIB library is for convenience more than anything else.

So, run the following commands on the source system:

CRTLIB LIB(DTALIB) TEXT('Remote Journal Test DATA Library')
CRTLIB LIB(JRNLIB) TEXT('Remote Journal Test JRN Library')


Then, run these commands on the target system:

CRTLIB LIB(DTALIB) TEXT('Remote Journal Test DATA Library')
CRTLIB LIB(JRNLIB) TEXT('Remote Journal Test JRN Library')
CRTLIB LIB(RMTJRN) TEXT('Remote Journal Test RMTJRN Library')
CRTLIB LIB(SAVLIB) TEXT('Remote Journal Test SAVE Library')


Next, create the journal environment on both systems. I created only the Local Journal objects, as the Remote Journal objects are created later using the Add Remote Journal (ADDRMTJRN) and Change Remote Journal (CHGRMTJRN) commands. The order in which you create these objects is important. The receiver and message queue have to exist before the journal can be created. The restore of the objects to the target system will create a link between the journal on that system and the objects. I could have saved the journal object and restored it to the target system--which removes the need to do the CHGJRN command before restoring the receivers from the remote journal--but the method I chose seems simpler.

Now, run these commands on the source system and on the target system:

CRTJRNRCV JRNRCV(JRNLIB/RCV0000001) TEXT('Test Journal Receiver')
CRTMSGQ MSGQ(JRNMSGQ) TEXT('Journal Message Queue')
CRTJRN JRN (JRNLIB/TSTJRN) JRNRCV(JRNLIB/RCV0000001)
MSGQ(JRNLIB/JRNMSGQ) TEXT('Remote Journal Test (Local journal)')


Now, create the files required on the source system. I chose to test only a few files. The aim of this test is to show the use of the APY/RMVJRNCHG commands, not to see how many objects can be included in the use of the commands. I created more than one file to show that the commands can handle a variety of requests.

Enter these commands on your source system:

CRTPF FILE(DTALIB/FILEA) RCDLEN(100) TEXT('Test File') 
CRTPF FILE(DTALIB/FILEB) RCDLEN(100) TEXT('Test File') 
CRTPF FILE(DTALIB/FILEC) RCDLEN(100) TEXT('Test File')
CRTPF FILE(DTALIB/FILED) RCDLEN(100) TEXT('Test File')


Now, journal the files on the source system to make sure all updates are captured as soon as the objects exist, thereby ensuring the files are associated with the journal object. When they are restored, the OS will try to attach them to a journal of the same name and library. This is, therefore, an important step; if you save the files before journaling is started and then try to use the APY/RMVJRNCHG commands against the remote journal receivers, the operation will fail, stating that the required objects do not exist on the system. This happens because no link has been created on the source system to the journal object you tried to run the commands against.

Note: You must journal the objects with *BOTH for the journal images. Otherwise, the RMVJRNCHG command could fail. To replace an image in the file, the system has to know what the image was like before a change.

Here are the commands for the source system:

STRJRNPF FILE(DTALIB/FILEA DTALIB/FILEB DTALIB/FILEC DTALIB/FILED) JRN(JRNLIB/TSTJRN) IMAGES(*BOTH) OMTJRNE(*OPNCLO)


Save the files from the source system, and restore them to the target system. This will copy the objects--complete with the "Journal ID" set--to the remote system. This step is very important; when the object is journaled, the system will set the Journal ID in the object. This Journal ID is a 10-byte value that uniquely identifies the journal object itself in the system. The journal entries created will contain this information, and when the APY/RMVJRNCHG commands are run, they check the Journal ID to ensure the IDs match in both the object and the journal entry. Failure to carry out this step will cause the test to fail.

Run this on the source system:

SAVOBJ OBJ(*ALL) LIB(DTALIB) DEV(*SAVF) SAVF(QGPL/TEST)


You can use any transfer method you have at your disposal to transfer the save file to the remote system. I used FTP.

Now, run this on the target system:
RSTOBJ OBJ(*ALL) SAVLIB(DTALIB) DEV(*SAVF) SAVF(QGPL/TEST) MBROPT(*ALL) ALWOBJDIF(*ALL)

This is what the environments look like so far:
http://www.mcpressonline.com/articles/images/2002/Remote%20Journaling%20and%20Data%20Recovery_2%20(V4)05.png


The next step is to create a remote database entry. This is required for the ADDRMTJRN command.

Run this command on the source system:
ADDRDBDIRE RDB(SHIELDSYS2) RMTLOCNAME('192.168.100.7' *IP) TEXT('remote data base on system 2')

Now, add a remote journal to the local journal. This is how the remote journal objects are created. You must have a remote database entry and a valid communications link for this to work. The Remote Journal Library is all that has to exist on the remote system. I separated the Local and Remote Journal Libraries on the target system to ensure the test would work.

Run this command on the source system:

ADDRMTJRN RDB(SHIELDSYS2) SRCJRN(JRNLIB/TSTJRN) 
TGTJRN(RMTJRN/TSTJRN) RMTRCVLIB(RMTJRN) MSGQ(QSYSOPR) TEXT('Remote Journal test (Remote Journal)')


Activate the remote journal. The system will ensure that the entries placed in the local journal are also transmitted over the communications link to the remote journal. I chose ASYNC as my delivery method because I wasn't interested in determining that all entries exist on both systems before they exist on the source. If you are using an HA product for the transport mechanism, you would only have an ASYNC process, because the HA product has to extract the entries and transport and store them on the remote system separately.

Use these commands on the source system:

CHGRMTJRN RDB(SHEILDSYS2) SRCJRN(JRNLIB/TSTJRN) 
TGTJRN(RMTJRN/TSTJRN) JRNSTATE(*ACTIVE)


Now, the links are created and the setup is complete.

http://www.mcpressonline.com/articles/images/2002/Remote%20Journaling%20and%20Data%20Recovery_2%20(V4)06.png
Testing the Theory

You're ready to start the test! Because you are only looking at the ability to use the APY/RMVJRNCHG commands against the remote journal receivers, you can use the Update Data (UPDDTA) command to create the changes to the files. First, you need to add some new records into the files. Then, you will change the data you have added to demonstrate the ability to change records. This is done on the source system only. (I left out updating FILED on purpose.)

UPDDTA FILE(DTALIB/FILEA)
UPDDTA FILE(DTALIB/FILEB)
UPDDTA FILE(DTALIB/FILEC)


You can enter as much or as little data as you wish, but I suggest creating a number of records with multiple updates against those records. Doing this will allow you to add or remove a variable number of changes and then verify that the data is as you expect. Remember, you have the ability to misuse the commands, and any corruption will probably be caused by a misunderstanding of the process used. If you try to use APYJRNCHG on changes that have already been applied, you will cause errors.

Once you have populated the files with some data and carried out a few updates, you're ready to apply those changes to the backup database.

Understand the Issues

First, you have to attach the receivers that have been created and maintained by the Remote Journal function to the local journal on the target system. This is carried out by a few simple actions, as follows.

When you save the object, it will have to be restored to the local journal environment. Therefore, unless you take action to resolve it, you will receive an error stating that the object already exists! My test involved using the CHGJRN command against the local journal on the target system. I used *GEN for the receiver parameter, which created a receiver RCV0000002.

Because I had only one receiver attached, I only had to delete the RCV0000001 object in the JRNLIB to allow the test to continue. If, however, I had saved the journal object only and restored this on the target system, the OS would have automatically attached a receiver that would not conflict. A test I ran showed that a journal that was saved as restored with RCV0000007 attached actually resulted in a RCV2000007 being attached on the restore.

My test did continue to do a CHGJRN on the source system and carry out further tests using multiple receivers. The results were the same, so I won't detail those tests. For the process to work, just be sure that the start and end sequence numbers and receiver names are correct.

Running the Test and Evaluating the Results

Create the save file in SAVLIB. You need to be able to save the receiver object from the Remote Journal Library, and a save file may be a quicker and simpler method than using tape. It was for me.

Run this command on the target system:

CRTSAVF FILE(SAVLIB/TEMP) TEXT('Remote journal Test save file')


Save the receiver to the save file. I had only one receiver, so I didn't have to determine which ones were required.

Run this on the target system:
SAVOBJ OBJ(*ALL) LIB(RMTJRN) DEV(*SAVF) OBJTYPE(*JRNRCV) SAVF(SAVLIB/TEMP)

Change the local journal on the target system to allow the removal of the attached receiver (RCV0000001). You'll be trying to restore RCV0000001 to the library.

Run this on the target system:

CHGJRN JRN(JRNLIB/TSTJRN) JRNRCV(*GEN)


Delete the old receiver from the local journal on the target system. Now, only RCV0000002 exists in the library. The IGNINQMSG parameter just says ignore the normal message that is sent if you try to delete a receiver before it has been saved.

Run this on the target system:

DLTJRNRCV JRNRCV(JRNLIB/RCV0000001) DLTOPT(*IGNINQMSG)


Restore the receiver from the remote journal to the local journal. The receiver will be restored and can be used for data replication. Because there is a receiver already "attached" to the journal, the system will still restore the new receiver as a "partial" receiver because the status of the receiver when it was saved was "attached." Only one receiver can be in attached status. While there are limitations as to what you can do with a partial receiver using the APY/RMVJRNCHG commands, they are not important for this test.

Run this on the target system:
RSTOBJ OBJ(*ALL) SAVLIB(RMTJRN) DEV(*SAVF) SAVF(SAVLIB/TEMP) MBROPT(*ALL) ALWOBJDIF(*ALL) RSTLIB(JRNLIB)

 
http://www.mcpressonline.com/articles/images/2002/Remote%20Journaling%20and%20Data%20Recovery_2%20(V4)07.png
Now, identify the sequence numbers required--this would be from the last save entry +1 to the last update entry for the files required in the test. The link between the receivers will be broken even though you have RCV0000001/2. The RCV0000002 has an entry that states it has a previous receiver of RCV0000001; however, the RCV0000001 receiver knows nothing about RCV0000002, so the OS will complain if you try to apply changes that exist across this boundary.

Run this on the target system:
DSPJRN JRN(RMTJRN/TSTJRN)

Apply the journal changes to the local files. The first entry after the objects were saved was 30, and the last entry in the journal was 86, so here's the command to run on the target system:

APYJRNCHG JRN(JRNLIB/TSTJRN) FILE((DTALIB/*ALL)) RCVRNG(JRNLIB/RCV0000001 JRNLIB/RCV0000001) FROMENT(30) TOENT(86)

This is what you have now:
 http://www.mcpressonline.com/articles/images/2002/Remote%20Journaling%20and%20Data%20Recovery_2%20(V4)08.png
Now, it's time to check the files to ensure the updates worked. The files were empty; they should now be an exact copy of the source files. When you ran the UPDDTA commands, you added information in the record data to show where you had updated the records the updates as opposed to creating them. This allowed you to track the data updates as well as new records. FILED was empty, and no updates were applied, so it should still be empty.

Run this on the source system to check:
DSPPFM FILE(DTALIB/FILEA)
DSPPFM FILE(DTALIB/FILEB)
DSPPFM FILE(DTALIB/FILEC)
DSPPFM FILE(DTALIB/FILED)

And then run the same thing on the target system.

Remove the journal changes. This time, you are going from last to first entry because the RMVJRNCHG has to start with the last change and work backward. So here's what to run on the target system:

RMVJRNCHG JRN(JRNLIB/TSTJRN) FILE((DTALIB/*ALL)) RCVRNG(JRNLIB/RCV0000001 JRNLIB/RCV0000001) FROMENT(86) TOENT(30)

Check the files to ensure the updates worked. You know the files were empty, so a check of the files on the remote system is all that is needed to ensure they have no entries. Run this on the target system:

DSPPFM FILE(DTALIB/FILEA)
DSPPFM FILE(DTALIB/FILEB)
DSPPFM FILE(DTALIB/FILEC)

After RMVJRNCHG, you have this:
http://www.mcpressonline.com/articles/images/2002/Remote%20Journaling%20and%20Data%20Recovery_2%20(V4)09.png
Did It Work?
This concludes the test and confirms that the remote journal can be used to update the remote database without the use of an HA product. Obviously, there are restrictions, but these restrictions are not showstoppers for most users. New features are being added all the time, and as I test those features, I will create documentation to show what can be achieved.

Sidebar: PRPQ 5799 AJC--Another Improvement!

The free PRPQ 5799 AJC for V5R1 allows the replay of additional object-wide changes. Unfortunately, there is no backward-compatible support because the PRPQ uses the features available only in V5R1. However, these features are standard in V5R2. A major benefit of the PRPQ is the ability to replicate more object commands using the Apply Journaled Changes Extended (APYJRNCHGX) command, an extended version of the APYJRNCHG command. While there are limitations, which are listed below, the additional support provided will be welcomed by most. I have been informed by reliable sources that the PRPQ also provides other improvements that offer increased flexibility and speed when running the APY/RMVJRNCHG commands.

APYJRNCHGX provides the capability to replay many object-level OS/400 commands. For example, CREATE FILE and CHANGE FILE have been enhanced for V5R1 such that they now emit new journal entries that APYJRNCHGX can recognize and replay.

This product's key goal is to improve recoverability of an application running on the iSeries. Prior to V5R1, many object-level operations were not journaled. Now, they are journaled, but APYJRNCHG is not capable of applying object-level journal entries (such as an ALTER_TABLE SQL command). The command supplied with this PRPQ, APYJRNCHGX, does apply the object-level operations.

This PRPQ is especially useful in environments where object-level changes occur between database backups. If an application creates or alters tables (or otherwise makes object-level changes) during productive operations, then this PRPQ provides the ability to more fully recover the database in the event of a disaster.

The APYJRNCHGX command applies the changes that have been journaled for a particular journaled object to a saved version of the object to recover it after an operational error or some form of damage. The difference between APYJRNCHGX and APYJRNCHG is that object-level changes are included as part of the APYJRNCHGX apply. Examples of object-level changes include the following SQL statements:

CREATE TABLE

  • CREATE INDEX
  • ALTER TABLE
  • DROP INDEX


Many object-level OS/400 commands (for example, CHGPF and DLTF) also deposit journal entries. For a complete list of object-level journal entries, refer to the online help of APYJRNCHGX or the Backup and Recovery book (SC41-5304).

For example, here's the command you'd use to apply changes to an SQL collection:
APYJRNCHGX JRN(MYCOLL/QSQJRN) FILE(MYCOLL/*ALL)

This command causes the system to apply all journaled changes to all files in the MYCOLL collection since the last save. The receiver range is determined by the system. The changes are applied beginning with the first journaled change on the receiver chain after each file was last saved and continue through all applicable journal entries to the point at which the files were last restored.

All object-level entries (for example, CREATE/DROP/ALTER TABLE) for the MYCOLL collection are included. Commitment control boundaries are honored because the default value for the CMTBDY parameter, *YES, is used.

The product does have a few limitations. It is English-only. It does not cover IFS, data queue, or data area changes (you have to use the normal APYJRNCHG command to service these objects). And the user may not specify individual file names on which to apply journaled changes (which is allowed with APYJRNCHG), and the library must be specified (that is, LIBRARY/*ALL for the file parameter).


Chris Hird first worked with High Availability (HA) at IBM Havant (UK) in 1989. He was responsible for the technical interface with the HA product's developer and for setting up a support structure in the UK to support the IBM Installations. He has spent a good deal of time installing the product at customer sites throughout EMEA prior to leaving IBM to set up Shield Software Services in 1993. Shield was an IBM Business Partner and became a MiMiX reseller on the purchase of the Multiple Systems Software by Lakeview Technology. Shield retained this status until being sold to another MiMiX reseller. Chris moved to Canada in 1997 and launched Shield Advanced Solutions (Canada) Ltd. Shield Advanced Solutions develops and provides tools and utilities aimed mainly at supporting HA environments. Chris still consults on HA implementations using his broad knowledge of the iSeries to help customers gain the most from their investment in a HA product. He can be contacted by phone at 519-940-1192 or via email at This email address is being protected from spambots. You need JavaScript enabled to view it..

 

BLOG COMMENTS POWERED BY DISQUS