Partner TechTip: Ensuring Reliable HA Failover and Switchover Processes

High Availability / Disaster Recovery
Typography
  • Smaller Small Medium Big Bigger
  • Default Helvetica Segoe Georgia Times
If your company has invested in high availability (HA) or is thinking about doing so, it's important to remember that a crucial function of a full-featured HA solution is the ability to move business-critical users and processes to a synchronized backup environment in the event of a hardware failure or site disaster. And for many companies, ensuring vital systems are running at all times means that this same capability can be equally important during planned system maintenance events such as hardware and OS upgrades.

Switchover vs. Failover

A controlled, deliberate process of moving to a synchronized backup is known variously as "switchover," "role swap," or "rollover." When done in an emergency, such as when the production system (source) suddenly fails, is damaged, or is destroyed, the process is referred to as a "failover." During a switchover, you have time to verify that data is synchronized and that all interfaces and applications can be enabled on the backup (target) system. But when a failover occurs, the luxury of verifying data and readiness is gone; you must enable the backup system as the production environment immediately.

Many companies purchase HA software to keep data mirrored on a backup machine. This allows for retrieving synchronized data from the backup in the event of a disaster; in other words, there is no need (or ability) to execute switchover or failover. This is certainly a reasonable use of the technology.

However, if your goal is to have a switch-ready environment (not just mirror data), it is critical that the HA software you implement has robust functions designed to enable this process smoothly. It is equally important that the HA vendor provides you with thorough training on the switchover/failover process and ensures operators execute one or more successful tests. It's best to have an automated solution. If you use one with an extensive number of commands and user-roll scripts, it opens the door for potential errors as well as a slower roll process.

Preparing for a Switchover Test

In order to prepare properly for a switchover test, you need knowledge of the following:

  • IP addresses of both the production and backup systems
  • All hardware and software interfaces, including how they can be redirected to the backup system
  • All non-IP devices that would need to be moved to the backup
  • All devices with special configuration requirements (for instance, some handheld devices have a controller with a single IP address, while others have individual IP addresses associated with each device)
  • The method for ending jobs in critical subsystems
  • A plan for notifying users when switchover tests start, locking users out of systems during the test, and notifying users after the test completes
  • A plan to test applications in order to verify data integrity on the backup after the switchover
  • Keys from software vendors that allow their application(s) to run on the backup system


With proper training and preparation, as well as HA software that automates much of the process, you should feel confident with the switchover process after a few tests. Still, switchover is not something that is tested a few times and then forgotten; executing a successful failover depends on a regularly tested switchover process. IBM recommends testing switchovers at least once per quarter, and many companies test once a month. In fact, many companies that have a backup system with resources that are similar to the production system go one step further by actually choosing to run for a period of time (e.g., a month) on one machine and then switchover to the other machine and run the business for the same period of time, repeating the process periodically.

Smooth Switchover = Fast ROI

By regularly testing switchovers, you can be assured that if disaster occurs, the failover process (a switchover with the production system disabled) will be smooth and successful. In addition, your company will be able to perform system maintenance tasks with little or no downtime that otherwise would disable the production system for hours or days.

Check out iTera's system availability offerings in the MC Showcase Buyer's Guide.

Dale Porter is the chief high availability development architect at iTera, Inc., a leading provider of high availability and disaster/recovery solutions in the iSeries world and the creator of Echo2 High Availability. For more information about iTera, call 801.799.0300. To contact Dale, send an email to This email address is being protected from spambots. You need JavaScript enabled to view it..

BLOG COMMENTS POWERED BY DISQUS