This article is an excerpt from System i Disaster Recovery and Planning.
Written by Richard Dolewski
Editor's Note: In Part 1, we covered issues 1 through 5. Now we finish up with issues 6 through 10.
Issue 6: Backups run longer than your backup window.
Backup window failures are successful backups that exceed the maximum allowed time for your backup window outage. This can creep up on IT. Usually your staff only becomes painfully aware of it when users start complaining. This is often overlooked because the backup job itself completes successfully, as no errors were reported in the backup logs. This rarely happens over a short time period; generally, it creeps up as does the system disk utilization. The 2-hour window was not a problem a year ago, as the backups only took 1 hour 35 minutes to run. Now, they are taking 2 hours and 20 minutes. Record the start and end time to ensure your backup job times do not sneak up on you.
Consider when you go to the corner office and ask for a sum of money to buy some additional disk storage. The boss says sure, because the business case supports the fact that your company needs the disk. In IT, it's easy to forget to ask for more time to back up the data during this same request. You just received approval for 500 GB of disk. How are you going to back up this extra data? System ASP is forever increasing, and backup windows are shrinking. Are you backing up enough, or making concessions? Should you have asked for a faster tape drive or a second one along with the disk storage?
Issue 7: Identify and back up orphan data.
Examine the number of libraries and directories in the IFS you have, versus those actually backed up. Perform a gap analysis. BRMS provides you with some level of reporting information. By examining the BRMS recovery report, you can clearly see when libraries have been backed up. Work through the report and try to bring back the entire system. You might be surprised that critical applications or application data are omitted and saved only monthly or quarterly. Ouch!
If you do not have BRMS, run a PRTDSKINF *LIB to obtain a printout of every library on your system. Examine your backup CLs, and enter the backup interval next to each library in the report. This might be a cumbersome exercise, but it sure is revealing.
Remember those hard-coded CL programs? These libraries made sense five years ago when the program was written, but has nothing been added in five years? No new applications or libraries? You tell me. Use commands built for easy recovery, like *ALLUSR, versus listing specific libraries one by one. Always ask yourself how you would rebuild the system after a complete loss with these backups. The orphan data problem is an excellent example of an inconsistency that can result from poor backup administration.
Issue 8: Automate your backup process.
A key to successful data protection is consistency. As the complexities of the backup infrastructure continue to grow, automation can help by providing tools to facilitate success. Manual tasks, such as checking logs on a scheduled basis, are key. Deploying automation to provide automated alerts for previously identified errors in job logs can make life easier. Scanning manually through a 100-page job log is not the way to go. Many third-party solution providers offer tremendous automated solutions for message management. Management Central, which comes with your system, should be given careful consideration as well.
Issue 9: Backups must be integrated into the change-control process.
Your systems are, by their nature, very dynamic. System backup planning must be part of the strategic planning process, so all backup considerations must be part of the organization's formal change-control process. This demands a two-way relationship. Application changes directly and indirectly related to the backup infrastructure must be part of the notification, impact assessment, and contingency planning process that's included within change control. In every change control committee meeting, you must always consider how you will back out if the change goes bad during implementation and how you will support it when it goes into production. The backup infrastructure is a production system, just like the most important application in an organization's environment. It requires the same respect and support.
Issue 10: Leverage your technical expertise.
Backup environments are very complex and get more so with the introduction of every new hardware or software technology. IBM continues to add additional backup and recovery functionality to the native operating systems, tape hardware, and BRMS. While much of this technology can be helpful, and it certainly all sounds good, there's a considerable challenge in understanding the functionality and how best to apply it into your own environment. All technical problems get resolved eventually, and opportunities to further enhance are always available.