Historically, many IT shops took data and application availability for granted, but what "taking it for granted" meant differed from one organization to the next. In some cases, a certain rationale was assumed: "Of course we have to protect the availability of our systems; if they're not available, our users can't do their work." In other cases, availability itself was assumed: "Hardware almost never fails, so there's no need to worry about availability." Neither attitude serves the business well.
The IT department might intuitively recognize the need for high availability, but in the quantitative, dollars-and-cents world of the CFO, "we need it because we need it," doesn't cut it. If that's your only justification for investing in higher availability, your systems will likely remain unprotected.
Depending on hardware reliability to ensure data and application availability is perilous because, yes, hardware is very reliable these days, but that only means that it rarely breaks down. It doesn't mean that a disaster or some lesser incident, such as a power failure or an operator error, won't force critical data and applications offline. Moreover, reliable hardware does nothing to stop the need for the frequent maintenance that often requires shutting systems down.
Thus, to fulfill the needs of the business, IT managers must justify, in hard-dollar terms, an investment in high availability and, after winning approval for it, take the necessary steps to combat the otherwise inevitable downtime. The case can be made by measuring the cost of each hour of downtime and the number of downtime hours that can be avoided. Multiplying those two values usually results in a number that is far higher than the cost of improving availability, thereby proving the potential for a large return on investment.
The good news—or bad news, depending on your perspective—is that, because of strong trends in the three areas of continuity, consolidation, and compliance, IT managers now find it much easier to justify high availability investments. In fact, many CFOs are now or soon will be beating down the doors of their IT departments to demand that IT managers take immediate action on the availability front.
Over the past half-decade, many screaming headlines have heralded stories of weather-related, tectonic, and terrorist catastrophes, putting disasters at the top of many people's minds. The threat to human life is, and should be, the paramount concern in these circumstances, but it's not the only concern. Today, most people's livelihoods and lives depend information technology to some extent. In the face of a disaster, at least one that can be predicted in advance, people can evacuate, but there is rarely time or logistic capacity to move computing hardware on such short notice.
Obviously, disasters are not the only causes of unexpected downtime. Hardware reliability numbers are now exceptionally high, often reported as at least three "nines" (i.e., 99.9% reliable), but that grossly understates the problem. Modern applications typically depend on a complex computing environment that encompasses a wide variety of components, possibly including many servers, data storage devices, routers, and other technologies, any of which may fail at any time. Thus, even if each component is 99.9% reliable, if a critical business application depends on 100 pieces of hardware, the infrastructure required to run that application is not 99.9% reliable, but rather 99.9% raised to the hundredth power, or a little under 90.48% reliable. That doesn't sound bad until you do the arithmetic. For a 24x7 operation, 90.48% reliable translates into about 834 hours of downtime annually. And that's just for "rare" hardware failures. It says nothing about software, network, or electricity grid reliability.
The impact of all of that downtime on a company's profitability can be staggering. Industry studies show that, depending on the industry and the size of the company, downtime costs can range up to almost $3 million per hour.
Planned maintenance is another obvious downtime source. Because you can schedule maintenance for slow times, the hourly downtime costs tend to be lower than for unplanned events, but because maintenance is significantly more frequent than hardware and software failures, disasters, and human errors, the annual cost of planned downtime is generally considerably higher. And, with more companies operating around-the-clock thanks to the Internet and globalization, those planned downtime hours are becoming more costly no matter when they're scheduled, thereby further raising the annual cost.
It's largely the disaster headlines that have driven many companies to look more closely at business continuity, often dedicating full-time resources to address the related issues, not just in IT, but also in the areas of personnel, business processes, logistics, and non-IT equipment. As these new continuity groups delve into their mandates, they often quickly come to recognize two IT-related facts: First, their businesses are so completely dependent on IT infrastructure that business continuity can be guaranteed only by assuring the availability of data and applications. And, second, to ensure data and application availability, it's necessary to look beyond disasters and address all potential causes of downtime.
Merger and acquisition activity ebbs and flows. Anecdotal and statistical evidence suggests that it is on the rise again. It may not be intuitively obvious, but business consolidation tends to heighten the demand for highly available information technology. On reflection, the reason is obvious: It costs little, if anything more to protect the availability of a server that is responsible for processing $1 billion worth of business than one that is responsible for $500 million.
To reduce hardware, software, and personnel costs, consolidated businesses typically merge at least some of the information technology of the previously separate firms. When they do, each server becomes responsible for managing a higher value of business. Thus, the justification for protecting the availability of those servers becomes correspondingly easier.
A different type of consolidation trend is occurring in some firms that are not merger and acquisition participants. When low-cost servers first hit the market, the thinking was that money could be saved by switching from mainframes and high-end midrange systems to cheaper commodity servers. But the economies proved to be false. Hardware costs often were lower, but the cost of software licenses for the larger number of servers and, much worse, the cost of installing, maintaining, securing, and backing up all of those distributed servers frequently swamped the hardware savings.
Consequently, many firms are reconsolidating those distributed servers onto centralized midrange systems, sometimes taking advantage of multi-operating system capabilities, such as on the iSeries, to run different platforms on a single server. The effect on the demand for high availability from this form of consolidation is similar to the effect from merger and acquisition activity: As each server becomes responsible for a higher percentage of the company's revenues, the importance of protecting its availability increases proportionately.
Continuity and consolidation trends have raised availability on businesses' priorities lists, but another dominant trend has made it a "must have" rather than just a "should have": an increasingly stringent regulatory environment. Compliance with new and strengthened regulations is now the focus of considerable corporate attention because failure to comply can bring substantial fines and, depending on the law broken, possibly also a jail sentence for the company's executives.
This article is not a detailed analysis of the relevant laws, nor is it written by a lawyer. You should, therefore, consult a legal expert when considering how regulations will affect your high availability requirements. In addition, the following primarily addresses the U.S. regulatory environment. Since most laws pertain to a single jurisdiction—typically, national, state/provincial, or local—if you're reading this from outside the U.S., further investigation of the relevant laws is advised.
Thanks in part to the accounting scandals of a few years ago and intensified concerns regarding security, businesses face a more stringent regulatory environment than in the past. Some of the new laws affect all businesses, some affect just public companies, while others are restricted to particular industries. A number of these regulations—including Sarbanes-Oxley (SOX), the Basel II accord, the Basel Committee's Capital Adequacy Directive (CAD III), the Gramm-Leach-Bliley Financial Services Modernization Act, and the Health Insurance Portability and Accountability Act (HIPAA), among others—require that organizations pay close attention to the integrity and availability of their data.
SOX was passed primarily in response to major accounting scandals. Regulations enacted under the bill require that publicly traded companies report annually on the effectiveness of their financial controls. Accordingly, corporations now have a much tighter focus on ensuring that they have the proper controls and audit processes in place to prevent and detect fraud. The legislation has significant consequences for non-compliance, including civil and criminal penalties.
SOX's Section 404, which demands that members of management certify their responsibility for financial controls and report on the adequacy and shortcomings of the controls, and Section 409, which requires the timely reporting of financial information, are the sections of the act that are most relevant to data and application availability. To fulfill the requirements of these sections, highly accessible systems are required to exhibit best practices and to ensure the complete, accurate, and timely provision of information for financial reporting, audits, and fraud investigations.
Basel II Accord
The international banking regulations of the Basel II Accord apply to all European banks and investment firms, as well as about 20 of the most internationally important banks in America. To help achieve the objective of reducing the risk to banks' liquidity, Basel II specifically requires that banks protect the availability of their data. Furthermore, the Basel Committee's Capital Adequacy Directive (CAD III) requires that banks have information about their assets and associated risks readily available. Specifically, a background document produced by the Basel Committee states that "Banks should have effective capacity, business continuity and contingency planning processes to help ensure the availability of e-banking systems and services."
Eight agencies and the states are charged with managing and enforcing the regulations that stem from the Gramm-Leach-Bliley Financial Modernization Act of 1999, which applies to any organization that collects or transfers private financial information for the purpose of doing business or providing a service.
The regulation that is most relevant to the availability discourse is the Safeguards Rule, which governs processes and controls designed to protect customers' financial data. Among other requirements, it specifies that financial institutions must protect against any anticipated threats or hazards to the security or integrity of such information. Thus, financial institutions must protect against the destruction of customer and account data whether from equipment failure, disasters, or human error.
This rule, which is enforced by the Federal Trade Commission, threatens fines of thousands of dollars a day, not to mention public embarrassment, as a consequence of non-compliance.
Health Insurance Portability and Accountability Act (HIPAA)
HIPAA has a particular focus on health information privacy, but it also requires that participants in the healthcare sector protect the integrity and availability of any health information they collect, maintain, use, or transmit. The act defines the healthcare industry very broadly to include not just healthcare insurers and providers, but also healthcare information clearinghouses. These clearinghouses include practically all organizations that touch healthcare data in any way, including, for example, banks that process health claims, outsourcing firms that key in patient records or insurance policy data, and so on.
HIPAA precisely defines "availability" as "the property that data or information is accessible and useable upon demand by an authorized person." The inclusion of the phrase "upon demand" means that it is probably not sufficient to just back up data overnight and send it offsite. An organization that requires days to recover damaged data likely wouldn't be considered to have fulfilled the "upon demand" requirement.
Service Level Agreements
In many businesses, compliance means more than adhering to government-imposed regulations. In response to competitive pressures, many companies, particularly in the business-to-business market, voluntarily include Service Level Agreements (SLAs) in their sales contracts. These SLAs set legally binding benchmarks that vendors must meet.
There's no universal standard for the form of an SLA. For example, an application service provider (ASP) might include a clause that promises that its applications will be available to customers 99.9% of the time, around the clock. A hardware seller might promise that, in the event of a problem, replacement parts and/or a technician will arrive within a certain number of hours or days after the problem is reported. A supplier of raw material or components for a just-in-time manufacturer may guarantee deliveries within a certain timeframe after receiving an order. The number and variety of possible SLA clauses is limitless but, to be meaningful, they typically specify significant penalties that will be paid as compensation for lack of compliance.
In the ASP example cited above, the need for system availability in order to fulfill the SLA terms is obvious and direct. The need may be more obscure in other cases, but availability is required to meet many SLA terms. Most businesses are dependent on their information technology. For example, a company that creates a customized SLA for each customer, digitizes it once signed, and stores it only electronically may not even be aware of its specific SLA obligations if its systems are unavailable. Likewise, if the company needs to dispatch a technician or a spare part to a customer in order to fulfill an SLA agreement, that may be difficult or impossible to do without access to the systems that assign those resources. Consequently, as SLAs are included in an increasing number of contracts and as their terms grow stricter, SLA compliance will become a rising factor in organizations' demands for higher availability.
The Growing Availability Imperative
A discussion of how to achieve higher availability is beyond the scope of this article. It is, however, clear that if you have not yet begun to look at that question of how, you likely will have to soon. The ongoing trends in the "three C's" of the contemporary business and technology environment (continuity, consolidation and compliance) are conspiring to force availability higher on the business agenda and consequently on the IT agenda. If high availability is not an imperative in your organization today, it probably will be tomorrow.
Alan Arnold is President and COO of Vision Solutions. Prior to joining Vision in 2000, he was a senior technology executive and subject matter expert for IBM technology at Cap Gemini Ernst & Young U.S. LLC. Arnold is recognized as an expert in the field of managed availability technology. He has authored or co-authored five books on technology and business topics that have been published worldwide. He has also written numerous articles for some of the leading publications in the industry.