20
Sat, Apr
5 New Articles

Consolidation and Availability: Protecting the Eggs in Your IT Basket

High Availability / Disaster Recovery
Typography
  • Smaller Small Medium Big Bigger
  • Default Helvetica Segoe Georgia Times

You've consolidated multiple servers onto one larger machine. Great! Now, what happens if that one server goes down or is taken offline for maintenance?

 

An increasing number of companies have consolidated or are considering consolidating multiple servers onto one or a few centralized systems. Their reasons for doing so are varied. The following are among those most often cited:

  • Go green. One large server consumes less power and, therefore, has a smaller carbon footprint than the total of the several smaller servers it replaces. Obviously, this also reduces electricity costs.
  • Reduce capital costs. All other things being equal, a powerful server costs more than one that delivers lower performance. However, because a high-powered server can eliminate the need for several lower-powered machines, total capital costs are typically lower when running a single system, particularly when all of the virtual servers running on it can share storage.
  • Cut human resource costs. Every server has to be administered and maintained. Consequently, consolidating servers onto one or a small number of systems reduces administration and maintenance overhead burdens.
  • Reduce software costs. Each server needs a license for the software that runs on it. Consolidating servers may allow you to eliminate some of those licenses and any associated maintenance fees. (Your ability to reduce licensing and maintenance costs depends on the terms of the software licenses and whether you consolidate by running multiple virtual severs or by running multiple applications under a single virtual/physical server.)
  • Reduce cooling costs. The air conditioning requirements of one large server are typically considerably less than for the several smaller servers it replaces.
  • Improve security. It is easier to place a tight security wall around a single server--and vigilantly maintain that wall--than it is to secure a large collection of isolated servers, particularly if those servers are geographically dispersed.
  • Scale applications and balance processing loads on the fly. By using a physical system that is capable of supporting multiple virtual servers and dynamically allocating resources among them, it is possible to scale application performance and balance loads across the virtual machines in near real-time.

Many CIOs recognize the benefits of consolidation, but one important issue is often overlooked until the consolidation project is well underway or complete. Once all of your IT eggs are in one basket, if that basket drops, all of the eggs will break.

 

When a relatively small server runs an isolated application supporting only one business function for one department, if that server fails or needs to be taken offline for maintenance, only a small area of the business may be affected. As long as there are no critical interdependencies, other business operations can continue unabated.

 

The same is not true after consolidating servers onto a single physical box. If that box is destroyed in a disaster, needs to be taken offline for maintenance, or otherwise becomes unavailable, then everything running on it stops. If the entire business is running off a single server, all operations will shut down when that server becomes unavailable, and they will stay shut down until it is brought back online.

All-Day Breakfast: Protect the Eggs

Web-based sales and service, globalization, and competitive pressures that force companies to maximize asset utilization by keeping factories running around the clock are now a widespread fact of life. As a result, many (if not most) companies have at least some applications that must run 24x7. Downtime--any downtime, whether due to scheduled maintenance or unexpected events--is unacceptable for these applications. Consequently, organizations strive to achieve very high availability (HA) and rapid disaster recovery (DR) for the servers running those critical applications. That being said, even the most globalized, Web-enabled of companies normally have a few applications that can tolerate some downtime, as well as some data loss and longish recovery times after a disaster.

 

When the critical and non-critical applications each run on separate servers, an appropriate level of investment in availability protection can be chosen for each application. However, after the critical and non-critical applications are merged onto a single physical system, the availability of that system must meet the standard required by the most-critical application. Thus, if the most-critical application must be available around the clock, then the physical machine must be protected against any downtime anytime.

 

Its partitioning capabilities, its single-level storage, and its ability to run multiple virtual servers utilizing different operating systems--including IBM i, AIX, and Linux--make IBM i running on a Power Systems server a good choice as a consolidation platform. What's more, unlike some other platforms, IBM i on Power Systems provides its virtualization facilities through PowerVM, without the need for separate virtualization software.

 

IBM i partitions cannot run Windows directly, but both Windows and Linux can be run on an integrated blade server, on an Integrated xSeries Server, or through an Integrated xSeries Adapter. IBM i can then share resources with these Windows servers. By running VMware ESX Server, integrated BladeCenter and xSeries servers can also be subdivided into multiple virtual servers that are allocated their own virtual resources.

 

HA and DR products have been available for some time on all of the common server operating system platforms. However, they are typically single operating system solutions. This can present a problem when consolidating multiple virtual servers utilizing different operating systems onto a single physical system. To restore operations after a primary system failure or when the primary system needs to be shut down for maintenance, it is often necessary to coordinate the switchover to the various virtual servers on the backup system.

 

This intricate coordination can be difficult when using a collection of standalone, single-operating-system HA/DR products from a variety of vendors. In contrast, selecting a suite of HA/DR software with a common interface across all platforms can greatly simplify the coordinated switchover tasks, while also speeding the switchover process and reducing the likelihood of human error. A tool that helps with the coordination is beneficial, but even more important is a vendor with the necessary expertise in its customer support center to help you in your recovery efforts.

Multiple-System Redundancy

The ultimate in availability protection comes when you set up a backup physical server. An HA product can then maintain real-time or near real-time copies of all of the data, applications, and other objects in all of the virtual servers running on the production physical server. This results in a fully replicated system, including replicas of all of the virtual servers running under it.

 

Having a real-time or near real-time backup is only half of the high availability solution. You must also be able to rapidly swap the roles of the production and backup servers when required. To this end, sophisticated HA software includes functionality on the backup server that monitors the availability of the production server and automatically fails over to the backup when the primary server is unavailable.

 

One way to implement this HA environment is to dedicate one physical server to run only production virtual servers and another to run only the backup virtual servers, as depicted in Figure 1. During normal operations, any updates applied to data, applications, or objects in the partitions on the production system (labeled System A) are copied to the replica partition on the backup system (System B).

 

030909HammondconsolidationHA.jpg

Figure 1: One physical server runs only production virtual servers; the other runs only the backup virtual servers.

 

In the scenario depicted in Figure 1, when System A fails or is taken offline for maintenance, users resume working on System B after a role swap. While System A is offline, the HA software--or the journal if the HA software uses it as a change-capture mechanism--captures the changes made on System B, and the HA software resynchronizes the two systems when System A returns to service.

 

If you take this approach, and if all applications running on all of the virtual servers must run around the clock without any performance degradation, then both the production and backup servers must be sized sufficiently to handle the full production load.

 

Using Power Systems and IBM i, each partition operates independently, and, apart from the primary partition, each can be shut down without affecting the other partitions. Thus, when you need to perform maintenance on a single partition, such as to upgrade the operating system or the applications running on it, it is not necessary to perform a role swap for the whole physical machine. Instead, you can simply role swap the single partition.

 

If some applications can be shut down or curtailed when an emergency shuts down the primary server or when it must be taken offline for maintenance, then the physical machine that normally serves as a backup can be sized smaller than the normal production machine. Development and test partitions are examples of partitions that can be shut down in an emergency, but there might be non-critical production applications that fit into this category as well.

Shared Roles

If some processing can be curtailed during emergencies and planned down time, there is also a way to reduce the size of both the production and the backup servers, without hampering performance during normal operations. As depicted in Figure 2, System A can run production partitions for some virtual servers that are backed up on System B. Simultaneously, System A can also serve as a backup for other production virtual servers that run on System B.

 

030909HammondconsolidationHAshared.jpg

Figure 2: Both systems run production virtual servers; each backs up the other.

 

In this case, each physical machine need only be sized to handle the production load that will run on it, plus the small replication-processing load for the backup virtual servers it supports.

 

When either System A or B becomes unavailable or must be taken offline for maintenance, non-critical applications on the other system can be acquiesced. Roles swaps can then be performed so that the functioning server assumes the production role for all critical applications.

 

The separation between the primary and backup systems, labeled "critical distance" on Figures 1 and 2, is an important consideration. If the two machines are in the same building, an emergency or disaster that shuts down one machine will likely shut down the other as well. This setup will, therefore, serve to maintain availability only after a single system failure or when the primary system must be shut down for maintenance.

 

Locating the two systems in different buildings, but still within a single office campus, will protect availability in situations that affect only one building, such as an air conditioning failure. However, a disaster, such as a hurricane, an earthquake, or a fire, may still halt business operations.

 

The only way to protect availability against all eventualities is to separate the primary and backup systems by a significant distance, preferably locating them in different cities using independent power grids. This way, a disaster that strikes one site will be unlikely to affect the other.

Data Vaulting

The nature of some companies' operations is such that they can tolerate long periods of downtime after a disaster, provided it doesn't happen too frequently. For these companies, the primary objectives are to protect their data and to return to operations as quickly as possible, even if that might be a matter of hours rather than the minutes that would be the case when using an HA solution.

 

These companies can take advantage of a DR solution that offers better data protection and faster recovery times than the tape-based backup options that they are likely using now.

 

The problems with tape-based backups are well known. First, because backups are typically taken only once a day, usually at night, updates applied after the last backup may be lost after a disaster. Second, it may take an intolerable amount of time to recover operations from tape, particularly if those tapes have to be retrieved from an offsite location.

 

Disk-based backup solutions, which often go by the name of data vaulting, provide an alternative to tape. Data vaulting products capture changes applied to a production system and electronically transfer them to another system that acts as a vault. Unlike HA products, changes are not continuously applied to a server replica at the vault location. Instead, after a disaster, data and applications are loaded onto a production server from tape. The production server is then brought up to date by using the data in the vault to apply any updates made after the last backup tape was created.

 

Depending on how frequently changes are transmitted to the vault, vaulting can provide data protection almost as good as that provided by an HA solution. Recovery times will be significantly longer than in HA environments; however, they may still be faster than when using tape backups alone, as the updates applied to the system after the last backup, but before the disaster struck, won't have to be re-entered manually.

 

One of the advantages of vaulting is that, because the vault is never called upon to act as a production server, the system it runs on does not have to be particularly powerful. In addition, the vault does not need to run the same operating system as the production system. Depending on the vendor, vaulting software can often capture changes from, say, an IBM i, an AIX, or a Linux server and store them in a vault running on, for example, a Windows or Linux system. When required, the vaulting software can restore data from the vault to a production system running on the required operating system.

 

Vaulting offers a benefit that is not provided by either HA or tape-based backup solutions alone. An HA solution retains an always current backup. If an individual data item becomes corrupted or is accidentally deleted, the HA software offers no way to recover that data unless the HA product includes a feature called continuous data protection (CDP). It might be possible to recover corrupted or deleted data from a backup tape, but this is a very time-consuming, complex process, and it will not yield the correct result if an update was applied to the data item or if the data item was created after the last backup tape was created. Data vaults, on the other hand, provide this data recovery facility as part of their inherent functionality.

 

The ability to run the vault on a significantly smaller system than the production server means that companies that have consolidated previously independent servers may be able to perform vaulting without buying any additional hardware. Instead, they may be able to run the vault on one of the servers that would otherwise be discarded.

Testing

Apart from installing the HA/DR technology, the three most important elements of any HA or DR strategy are testing, testing, and testing. Murphy typically makes his presence well known in the midst of a disaster, when stress levels are particularly high. If something can go wrong, it almost certainly will in these instances. And the more complex your IT environment, such as one supporting multiple virtual servers on a single physical machine, the more there is to go wrong.

 

What's more, it's rarely something obvious that trips you up. Something as seemingly minor as failing to keep user profiles up to date on the backup machine may make it impossible to resume operations. The only way to be certain that a role swap will work as it should when a disaster strikes is to thoroughly test the entire role swap process--including the human processes--in advance.

Is Server Consolidation Right for You?

The benefits of server consolidation are many, but before you put all of your IT eggs in one server basket, make sure that you have carefully thought through all of the availability implications. The consequences can be severe, but the increased criticality of a single basket should not be a deterrent to server consolidation. With the proper HA and DR technologies, processes, and policies in place, the resulting consolidated server can be even more highly available than your existing IT environment.

Bill Hammond
Bill Hammond directs Vision Solutions' product marketing efforts for information availability software solutions. Hammond joined Vision Solutions in 2003 with more than 15 years of experience in product marketing, product management, and product development roles in the technology industry. As director of product marketing at Vision Solutions, Hammond is responsible for product positioning and messaging, product launches, and marketplace intelligence for the company's high availability, disaster recovery, systems management, and data management solutions. 

 

 

BLOG COMMENTS POWERED BY DISQUS

LATEST COMMENTS

Support MC Press Online

$0.00 Raised:
$

Book Reviews

Resource Center

  • SB Profound WC 5536 Have you been wondering about Node.js? Our free Node.js Webinar Series takes you from total beginner to creating a fully-functional IBM i Node.js business application. You can find Part 1 here. In Part 2 of our free Node.js Webinar Series, Brian May teaches you the different tooling options available for writing code, debugging, and using Git for version control. Brian will briefly discuss the different tools available, and demonstrate his preferred setup for Node development on IBM i or any platform. Attend this webinar to learn:

  • SB Profound WP 5539More than ever, there is a demand for IT to deliver innovation. Your IBM i has been an essential part of your business operations for years. However, your organization may struggle to maintain the current system and implement new projects. The thousands of customers we've worked with and surveyed state that expectations regarding the digital footprint and vision of the company are not aligned with the current IT environment.

  • SB HelpSystems ROBOT Generic IBM announced the E1080 servers using the latest Power10 processor in September 2021. The most powerful processor from IBM to date, Power10 is designed to handle the demands of doing business in today’s high-tech atmosphere, including running cloud applications, supporting big data, and managing AI workloads. But what does Power10 mean for your data center? In this recorded webinar, IBMers Dan Sundt and Dylan Boday join IBM Power Champion Tom Huntington for a discussion on why Power10 technology is the right strategic investment if you run IBM i, AIX, or Linux. In this action-packed hour, Tom will share trends from the IBM i and AIX user communities while Dan and Dylan dive into the tech specs for key hardware, including:

  • Magic MarkTRY the one package that solves all your document design and printing challenges on all your platforms. Produce bar code labels, electronic forms, ad hoc reports, and RFID tags – without programming! MarkMagic is the only document design and print solution that combines report writing, WYSIWYG label and forms design, and conditional printing in one integrated product. Make sure your data survives when catastrophe hits. Request your trial now!  Request Now.

  • SB HelpSystems ROBOT GenericForms of ransomware has been around for over 30 years, and with more and more organizations suffering attacks each year, it continues to endure. What has made ransomware such a durable threat and what is the best way to combat it? In order to prevent ransomware, organizations must first understand how it works.

  • SB HelpSystems ROBOT GenericIT security is a top priority for businesses around the world, but most IBM i pros don’t know where to begin—and most cybersecurity experts don’t know IBM i. In this session, Robin Tatam explores the business impact of lax IBM i security, the top vulnerabilities putting IBM i at risk, and the steps you can take to protect your organization. If you’re looking to avoid unexpected downtime or corrupted data, you don’t want to miss this session.

  • SB HelpSystems ROBOT GenericCan you trust all of your users all of the time? A typical end user receives 16 malicious emails each month, but only 17 percent of these phishing campaigns are reported to IT. Once an attack is underway, most organizations won’t discover the breach until six months later. A staggering amount of damage can occur in that time. Despite these risks, 93 percent of organizations are leaving their IBM i systems vulnerable to cybercrime. In this on-demand webinar, IBM i security experts Robin Tatam and Sandi Moore will reveal:

  • FORTRA Disaster protection is vital to every business. Yet, it often consists of patched together procedures that are prone to error. From automatic backups to data encryption to media management, Robot automates the routine (yet often complex) tasks of iSeries backup and recovery, saving you time and money and making the process safer and more reliable. Automate your backups with the Robot Backup and Recovery Solution. Key features include:

  • FORTRAManaging messages on your IBM i can be more than a full-time job if you have to do it manually. Messages need a response and resources must be monitored—often over multiple systems and across platforms. How can you be sure you won’t miss important system events? Automate your message center with the Robot Message Management Solution. Key features include:

  • FORTRAThe thought of printing, distributing, and storing iSeries reports manually may reduce you to tears. Paper and labor costs associated with report generation can spiral out of control. Mountains of paper threaten to swamp your files. Robot automates report bursting, distribution, bundling, and archiving, and offers secure, selective online report viewing. Manage your reports with the Robot Report Management Solution. Key features include:

  • FORTRAFor over 30 years, Robot has been a leader in systems management for IBM i. With batch job creation and scheduling at its core, the Robot Job Scheduling Solution reduces the opportunity for human error and helps you maintain service levels, automating even the biggest, most complex runbooks. Manage your job schedule with the Robot Job Scheduling Solution. Key features include:

  • LANSA Business users want new applications now. Market and regulatory pressures require faster application updates and delivery into production. Your IBM i developers may be approaching retirement, and you see no sure way to fill their positions with experienced developers. In addition, you may be caught between maintaining your existing applications and the uncertainty of moving to something new.

  • LANSAWhen it comes to creating your business applications, there are hundreds of coding platforms and programming languages to choose from. These options range from very complex traditional programming languages to Low-Code platforms where sometimes no traditional coding experience is needed. Download our whitepaper, The Power of Writing Code in a Low-Code Solution, and:

  • LANSASupply Chain is becoming increasingly complex and unpredictable. From raw materials for manufacturing to food supply chains, the journey from source to production to delivery to consumers is marred with inefficiencies, manual processes, shortages, recalls, counterfeits, and scandals. In this webinar, we discuss how:

  • The MC Resource Centers bring you the widest selection of white papers, trial software, and on-demand webcasts for you to choose from. >> Review the list of White Papers, Trial Software or On-Demand Webcast at the MC Press Resource Center. >> Add the items to yru Cart and complet he checkout process and submit

  • Profound Logic Have you been wondering about Node.js? Our free Node.js Webinar Series takes you from total beginner to creating a fully-functional IBM i Node.js business application.

  • SB Profound WC 5536Join us for this hour-long webcast that will explore:

  • Fortra IT managers hoping to find new IBM i talent are discovering that the pool of experienced RPG programmers and operators or administrators with intimate knowledge of the operating system and the applications that run on it is small. This begs the question: How will you manage the platform that supports such a big part of your business? This guide offers strategies and software suggestions to help you plan IT staffing and resources and smooth the transition after your AS/400 talent retires. Read on to learn: