This old mantra is set up to support complacency.
A couple weeks ago, I upgraded an old Model 520 to a POWER9 S914 in Ontario, Canada. It was a great coup for a couple of reasons, the first being that these folks were off IBM hardware and software maintenance for about 10 years or so. They were having physical problems with the old machine, and it was really on its last legs. Secondly, considering we brought them from IBM i 5.4 all the way up to version 7.2, we had to navigate past a few object observability issues. We had to work with a couple of software vendors to get updated objects, which worked out marvelously.
All in all, it was a great success. The customer is now on a supported release of IBM i and on supported hardware.
During some of the “hurry up and wait” time you find during any migration, I decided to make myself useful and review the state of the backups. I did a WRKOBJ QSAVSYS to find out the date of the last SAVSYS (which was actually part of a GO SAVE option 21). It was 18 months ago. That isn’t too great, so I advised them that it’s best to do one at least once a year. But since they’re not a 24x7 shop, I encouraged them to do it once every three to six months. Nobody ever got fired for doing too many full-system saves.
Then I turned my attention to the nightly backup routine. That’s when I really saw how things could’ve gone sideways for the customer.
The backup ran every night like clockwork. Unfortunately, the backup program hadn’t been changed in about 15 years. It was hard-coded to save exactly 12 libraries. The rub? The customer had 55.
No security data. No DLOs. No IFS. No configuration.
I quickly edited and compiled the backup program to get all that good stuff.
As I was flying back home, I thought about what could’ve gone wrong. They were incredibly lucky that the old machine didn’t die before we did the migration. If it had, they would’ve been in a real pickle.
It’s very important to run Save Security Data (SAVSECDTA) regularly (i.e., nightly). There’s no reason not to. In the case of the example company, any user profiles created since the last full system save would’ve had to be recreated.
Document Library Objects (DLOs) can be incredibly important for customers with old software packages. When I find machines running V5R4 or older, I usually find they make heavy use of DLOs. Save Document Library Object (SAVDLO) should be run nightly.
Integrated File System (IFS) objects can be important too. These folks didn’t have many custom IFS directories. But if you do, you need to be backing them up.
Still on the plane, my mind started to wander to all the possible problems with backup jobs:
- If your backup program takes four minutes to complete, you might not be saving much.
- If the guy who wrote your backup program hasn’t been seen in 15 years, it’s time to review the source code.
- If your backup program was last compiled in 1994, you need to review what it does.
- If you don’t know the name of your backup program, you’ve got a surprise date with misery sometime in the future.
- If you’ve never tried to recover from your backup, how do you know it even works?
It’s these little things, such as nightly backup procedures that have run since forever, that I have a major problem with. I’ve mentioned this before I’m sure, but the worst attitude our community has about IBM i is that “it sits there and runs.” That mantra is set up to support complacency. You see it in the lack of modernization. You see it in the lack of security. You see it as a reason when people don’t upgrade or PTF their machines regularly. And yeah, you see it in the backups or lack thereof. Even worse, I’ve seen companies let their hardware maintenance lapse because the system is so rock solid.
Now that doesn’t mean that nobody does upgrades or PTFs or modernization or security or backup and recovery well. But “it just sits there and runs” is certainly the most common reason when those very important IT tasks aren’t done. And it isn’t as much a reason as it is an excuse.
I’ll give you another example that should’ve been foreseen.
I had someone call me last month with a disk drive failure on a Model 515 (no maintenance contract when that machine went out of service earlier in the year). They had been warned. I got them some used drives, and I went to do the repair. When I get there, there’s no working twinax console. Nobody knows the SST/DST passwords, so I can’t install the new drive. No console means no resetting the QSECOFR password and then forcing DST to the console to change it. No console also means no restricted state to do a Save 21 that hasn’t been done in eight years. What a mess! Management is now trying to push the system to their new ERP provider on Windows. But you know what? The box is still sitting there and running. The company will be lucky if they get migrated before their luck runs out.
We’re in hurricane season now. I just had Dorian rip my gazebo to pieces after it tore up the rest of Nova Scotia. Power was out for awhile too. There’s a lot of parallels here. There are always blind spots to any disaster recovery. My generator being one. I didn’t test it this past summer. When we lost power, I found out I needed a new spark plug. It would have been a simple $5.99 fix and only 20 minutes to test it. Instead, for 24 hours, my family was in the dark. Three kids with no electronics and no Internet. Dear old Dad is getting the blame for sure.
When the next hurricane comes up the East Coast, you can be sure I’ll have already tested the generator and I have two cans of gasoline ready to go.
But I’m not running a data center here. Are you prepared for the next storm? Or the next hardware failure? Do you have a maintenance contract with IBM? If you’ve got a machine going end-of-service at the end of September, do you have an extended maintenance contract? There’s a big difference. I know shops that think their standard hardware maintenance will cover them into November, when it expires. Nope. The weeks after September 30 will require extended hardware maintenance. Do you have a disaster recovery plan? Does step one in that plan include “get the backup tape?” If so, you’d best be sure you have everything you need on it.