06
Sun, Oct
2 New Articles

20 Quick Disaster Recovery Tips

High Availability / Disaster Recovery
Typography
  • Smaller Small Medium Big Bigger
  • Default Helvetica Segoe Georgia Times

 

Disasters are rare, but when they occur, they can bankrupt an unprepared organization. Here are 20 quick tips than can help you to minimize a disaster's impact on your IT assets.

 

Disasters happen. They're uncommon, but they happen. Pretending otherwise serves only to amplify their negative effects.

 

In many people's minds, a "disaster" means a hurricane, earthquake, flood, fire, or other natural calamity or, possibly, a terrorist attack. Those clearly qualify as disasters, but for the purposes of this article, "disaster" has a broader meaning.

 

In this context, a disaster is any event that causes either of the following:

  • The destruction of all online operational copies of an organization's data and/or applications. "Online operational copies" includes both the production copies and any ready-to-run backup copies that can be placed in the production role immediately and, preferably, seamlessly.
  • The loss of access to all online operational copies of the organization's data and/or applications for a sufficiently long period such that a recovery operation will be faster and more cost-effective than waiting for the online operational copies to come back online.

 

In the event of a natural disaster or terrorist attack, the organization's first objective should be, clearly, to protect and maintain the safety and security of its employees and other people on its premises. Once this objective has been achieved, or if people have not been placed at risk by the situation, the highest-priority task of the IT department after a disaster is to get the business' critical systems running again as quickly as possible.

 

Failure to resume operations swiftly can compound the disaster and threaten the survival of the organization. According to one often-cited statistic from the U.S. Bureau of Labor Statistics, 40 percent of all companies that experience a disaster never reopen, and more than 25 percent of the remaining companies close within two years.

 

Thus, disaster recovery (DR) is crucial. Nonetheless, DR doesn't just happen. Furthermore, in the midst of the excessive stress that is inevitable in any DR process, if something can go wrong, there is a high probability that it will go wrong. And in any complex IT environment, many unimaginable things can go wrong.

 

Fortunately, there are a number of ways to lessen the chance of things going wrong and to reduce the impact of a disaster. The following are 20 quick tips that can help you to ensure DR success.

 

  1. Inventory your IT assets. To recover from a disaster, you first must know what needs to be recovered. If you haven't already done so, make a detailed inventory of all of your IT assets--both tangible and intangible. What hardware, software, and data will have to be recovered? Which skills will be required to perform the recovery operations and then run the business' systems at a backup location if necessary?

    The IT asset inventory list should be included in your disaster recovery plan, which is the subject of the next few tips.

  2. Maintain offsite data backups. A comprehensive tape archive strategy is crucial. To minimize recovery times in situations where the physical assets of the primary data center are still operational, you must be able to recover data from tapes that are stored locally.

    However, you also need to protect business operations against the risk of the destruction of the data center. Thus, you must also be able to recover from tapes at a secondary location.

    Having an up-to-date copy of backup data at a remote location is worth almost any price. A local fireproof vault is not an adequate alternative to off-site storage because, depending on the circumstances, the vault may not offer sufficient protection or it may not be accessible quickly after a disaster.
  3. Prioritize your data and applications. Data and applications are not all created equal. Assess the varying criticality of data and applications. Some of them are utterly essential to reestablish the business. Those applications and data must be restored first. Recovery of secondary applications and data can be deferred until the critical apps and data are restored. Your DR plan should explicitly state the recovery order of data and applications to reflect these priorities.

  4. Define detailed disaster recovery processes. After creating your IT asset inventory and prioritizing your IT assets, map out detailed, step-by-step instructions for recovering each IT asset, in the order in which they should be recovered.
  5. Don't omit "standalone" data. Increasingly, business-critical data and documents are stored on laptop and desktop computer disk drives. Your DR plan should include details on how this data will be backed up and recovered if lost.

    And remember, a laptop or desktop computer may be destroyed in the same disaster that strikes a data center. Therefore, it is not enough to back up PC-based data onto a network drive in the primary data center. Critical PC-based data must also be included in the offsite backup datasets.

  6. Formally document the plan. A disaster recovery plan that exists only in someone's head is no plan at all. Keep in mind that you are creating a plan to recover from a disaster. While we'd rather not consider the prospect, it's possible that some critical employees will not be available due to the effects of the disaster. Even if the worst doesn't happen, some key staff may be on vacation and unreachable during a recovery operation. If the recovery plan exists only in those people's heads, the available staff won't be able to execute the plan.

  7. Keep hard copies of the plan. There may be some efficiencies to be gained from storing a disaster recovery plan online. For example, it may be possible to automate the initiation of some of the recovery processes and use the system to enforce the completion of checklists. Nonetheless, also keep printed copies of the recovery plan in secure locations, including at the recovery site. A plan for restarting the organization's systems that is locked inside a system that is unavailable will be of no use when it comes time to initiate the recovery operations.

    Remember to replace the hard copies whenever the plan is updated.

  8. Keep multiple copies of the plan. A plan that exists only at the primary data center will be useless if the data center is destroyed. At a minimum, store a copy of the plan at the recovery site. Keeping additional copies of the plan at the homes of one or more of the key personnel who will be involved in the recovery operations will provide added safety and may allow those people to begin executing the plan without having to get to the recovery site first.

  9. Test the solution. In any complex system or process, what works in theory often fails in practice. Regular testing not only ensures that your recovery plan is viable, but also acts as a training tool. People who have already performed the recovery procedures a number of times during regular testing will be familiar with the plan and confident in their abilities to perform the required actions.

    You should test the recovery processes at least three or four times per year. Tests will often reveal flaws in your recovery plan. When this happens, be sure to update the plan to fix the flaws.
  10. Create and maintain a test script. Avoid using an off-the-cuff approach to DR testing. Maintain a test script that follows your DR recovery plan as closely as possible and tests as much of it as possible. (For operational reasons, it may not be possible to test all aspects of a recovery operation during every test, but every effort should be made to leave as little as possible out of the DR tests.)

    Remember to update the test script when your DR plan changes.
  11. Consider disk-based remote backups. Traditional tape-based backups suffer from a variety of weaknesses. In addition to tape being slower than disk during backup and recovery operations, backup tapes are usually created only daily, typically at night. If a disaster occurs just before a new backup tape is created, there may be as much as a full day's worth of data that does not exist on any backup tape.

    Disk-based backup products that transmit changed data to an offsite location much more frequently than daily--perhaps even continuously--can reduce the volume of unsaved data, possibly to zero.
  12. Store required passwords in multiple locations. You never know what a disaster will throw at you. If system passwords are available only at the primary site, you may find that you are unable to access critical information if that site is destroyed.

    What's more, if only one person has the required high-level system passwords and that person stores them only in his or her head, you may be unable to restore your systems if that person is not available after a disaster. It is, therefore, essential to designate backups for all key staff.
  13. Ensure that backup procedures are followed. It sounds simple enough, but be sure that your data backup and protection procedures are followed rigorously on the prescribed schedule. After regularly backing up data for a long time without experiencing a disaster, and therefore not needing the backups, there is a tendency to become lax about compliance with backup policies. But, because you can't recover what you didn't save, this negligence could result in a business failure when a disaster does happen.
  14. Respect tapes' "best-before" dates. Tapes have a limited shelf life that is determined primarily by the number of times the tapes are used. In addition to wear through use, tapes can become brittle and corrupted over time even if they aren't used.

    Tapes should be rotated regularly and replaced as they age. If your tape supplier provides life-expectancy estimates, replace tapes before the recommended expiry dates.

    Err on the side of caution. Tape life-expectancy values are only estimates. It is much less expensive to replace a tape that could have lasted for a few more runs than to find through brutal experience that you can't recover your data when necessary because a tape is unreadable.

    As a general rule of thumb, tapes used on a daily basis should be replaced every six to nine months to avoid deterioration. Other tapes should be replaced on a regular, less-frequent schedule based on the frequency of use.
  15. Maintain multiple communication channels. When you need to notify your staff about a DR event, you may not have access to normal communication channels. Email may not be working, or the phone system may be down. Consider text messaging, personal email addresses, etc. as alternative communication vehicles. In addition, there are third-party companies that can handle this communication for you.
  16. Automate as much as possible. Human error is possible under any circumstances. In particularly stressful situations, it is almost inevitable. Thus, the more of the recovery processes that you can automate, thereby removing the human element, the better.

    However, keep in mind that the systems responsible for automating the recovery operations may be unavailable after a disaster. Thus, just as your business applications and data need backups, you need manual backups for all of the automated recovery processes.
  17. Don't neglect security. When recovering from a disaster, it can be tempting to bypass your normal security protocols and policies in order to simplify and speed the recovery. Generally, this is a bad idea. Those security policies were established for a reason, and you don't want to create a potential security risk that can be as disruptive as, or more disruptive than, the disaster itself.
  18. View DR as an ongoing, evolving effort. Businesses change and grow, and their IT infrastructure, applications, and data evolve to support the changes and growth. As a result, a static DR plan will protect yesterday's data and applications, while leaving today's business operations exposed. Thus, don't approach DR as a one-time project, but rather as an ongoing exercise.
  19. Build a culture that emphasizes the importance of DR preparedness. If senior management is seen to have little concern for DR preparedness, that attitude will filter down to the front-line employees responsible for defining and executing the recovery processes and maintaining the backup data stores. Therefore, senior-level buy-in to business continuity initiatives is essential. In addition, that buy-in must be clearly communicated throughout the organization.
  20. Ask for help. Creating an effective DR plan can be challenging. A DR consultant with extensive knowledge and experience in the field can help. This allows you to leverage the experience of many companies and more effectively craft a plan that meets all of your business requirements at a cost that fits your budget and is justified by the benefits.

    Furthermore, it is human nature to often not see consciously what's most obvious to us. A DR consultant may spot an unprotected data store, application, process, or piece of hardware that employees overlook because its use has become second nature to them.

 

 

Bill Hammond
Bill Hammond directs Vision Solutions' product marketing efforts for information availability software solutions. Hammond joined Vision Solutions in 2003 with more than 15 years of experience in product marketing, product management, and product development roles in the technology industry. As director of product marketing at Vision Solutions, Hammond is responsible for product positioning and messaging, product launches, and marketplace intelligence for the company's high availability, disaster recovery, systems management, and data management solutions. 

 

 

BLOG COMMENTS POWERED BY DISQUS

LATEST COMMENTS

Support MC Press Online

$

Book Reviews

Resource Center

  • SB Profound WC 5536 Have you been wondering about Node.js? Our free Node.js Webinar Series takes you from total beginner to creating a fully-functional IBM i Node.js business application. You can find Part 1 here. In Part 2 of our free Node.js Webinar Series, Brian May teaches you the different tooling options available for writing code, debugging, and using Git for version control. Brian will briefly discuss the different tools available, and demonstrate his preferred setup for Node development on IBM i or any platform. Attend this webinar to learn:

  • SB Profound WP 5539More than ever, there is a demand for IT to deliver innovation. Your IBM i has been an essential part of your business operations for years. However, your organization may struggle to maintain the current system and implement new projects. The thousands of customers we've worked with and surveyed state that expectations regarding the digital footprint and vision of the company are not aligned with the current IT environment.

  • SB HelpSystems ROBOT Generic IBM announced the E1080 servers using the latest Power10 processor in September 2021. The most powerful processor from IBM to date, Power10 is designed to handle the demands of doing business in today’s high-tech atmosphere, including running cloud applications, supporting big data, and managing AI workloads. But what does Power10 mean for your data center? In this recorded webinar, IBMers Dan Sundt and Dylan Boday join IBM Power Champion Tom Huntington for a discussion on why Power10 technology is the right strategic investment if you run IBM i, AIX, or Linux. In this action-packed hour, Tom will share trends from the IBM i and AIX user communities while Dan and Dylan dive into the tech specs for key hardware, including:

  • Magic MarkTRY the one package that solves all your document design and printing challenges on all your platforms. Produce bar code labels, electronic forms, ad hoc reports, and RFID tags – without programming! MarkMagic is the only document design and print solution that combines report writing, WYSIWYG label and forms design, and conditional printing in one integrated product. Make sure your data survives when catastrophe hits. Request your trial now!  Request Now.

  • SB HelpSystems ROBOT GenericForms of ransomware has been around for over 30 years, and with more and more organizations suffering attacks each year, it continues to endure. What has made ransomware such a durable threat and what is the best way to combat it? In order to prevent ransomware, organizations must first understand how it works.

  • SB HelpSystems ROBOT GenericIT security is a top priority for businesses around the world, but most IBM i pros don’t know where to begin—and most cybersecurity experts don’t know IBM i. In this session, Robin Tatam explores the business impact of lax IBM i security, the top vulnerabilities putting IBM i at risk, and the steps you can take to protect your organization. If you’re looking to avoid unexpected downtime or corrupted data, you don’t want to miss this session.

  • SB HelpSystems ROBOT GenericCan you trust all of your users all of the time? A typical end user receives 16 malicious emails each month, but only 17 percent of these phishing campaigns are reported to IT. Once an attack is underway, most organizations won’t discover the breach until six months later. A staggering amount of damage can occur in that time. Despite these risks, 93 percent of organizations are leaving their IBM i systems vulnerable to cybercrime. In this on-demand webinar, IBM i security experts Robin Tatam and Sandi Moore will reveal:

  • FORTRA Disaster protection is vital to every business. Yet, it often consists of patched together procedures that are prone to error. From automatic backups to data encryption to media management, Robot automates the routine (yet often complex) tasks of iSeries backup and recovery, saving you time and money and making the process safer and more reliable. Automate your backups with the Robot Backup and Recovery Solution. Key features include:

  • FORTRAManaging messages on your IBM i can be more than a full-time job if you have to do it manually. Messages need a response and resources must be monitored—often over multiple systems and across platforms. How can you be sure you won’t miss important system events? Automate your message center with the Robot Message Management Solution. Key features include:

  • FORTRAThe thought of printing, distributing, and storing iSeries reports manually may reduce you to tears. Paper and labor costs associated with report generation can spiral out of control. Mountains of paper threaten to swamp your files. Robot automates report bursting, distribution, bundling, and archiving, and offers secure, selective online report viewing. Manage your reports with the Robot Report Management Solution. Key features include:

  • FORTRAFor over 30 years, Robot has been a leader in systems management for IBM i. With batch job creation and scheduling at its core, the Robot Job Scheduling Solution reduces the opportunity for human error and helps you maintain service levels, automating even the biggest, most complex runbooks. Manage your job schedule with the Robot Job Scheduling Solution. Key features include:

  • LANSA Business users want new applications now. Market and regulatory pressures require faster application updates and delivery into production. Your IBM i developers may be approaching retirement, and you see no sure way to fill their positions with experienced developers. In addition, you may be caught between maintaining your existing applications and the uncertainty of moving to something new.

  • LANSAWhen it comes to creating your business applications, there are hundreds of coding platforms and programming languages to choose from. These options range from very complex traditional programming languages to Low-Code platforms where sometimes no traditional coding experience is needed. Download our whitepaper, The Power of Writing Code in a Low-Code Solution, and:

  • LANSASupply Chain is becoming increasingly complex and unpredictable. From raw materials for manufacturing to food supply chains, the journey from source to production to delivery to consumers is marred with inefficiencies, manual processes, shortages, recalls, counterfeits, and scandals. In this webinar, we discuss how:

  • The MC Resource Centers bring you the widest selection of white papers, trial software, and on-demand webcasts for you to choose from. >> Review the list of White Papers, Trial Software or On-Demand Webcast at the MC Press Resource Center. >> Add the items to yru Cart and complet he checkout process and submit

  • Profound Logic Have you been wondering about Node.js? Our free Node.js Webinar Series takes you from total beginner to creating a fully-functional IBM i Node.js business application.

  • SB Profound WC 5536Join us for this hour-long webcast that will explore:

  • Fortra IT managers hoping to find new IBM i talent are discovering that the pool of experienced RPG programmers and operators or administrators with intimate knowledge of the operating system and the applications that run on it is small. This begs the question: How will you manage the platform that supports such a big part of your business? This guide offers strategies and software suggestions to help you plan IT staffing and resources and smooth the transition after your AS/400 talent retires. Read on to learn: