IBM has staked a lot on the computing model called On Demand and, over the last five years, it has invested heavily in developing the tools and services necessary to propel the architecture. One of IBM's more crucial business strategies is to engage government entities at both the federal and the local levels in expanding the expertise and services of their e-government offerings. In the United States, this e-government initiative stretches back to the Clinton administration, but it has been more fully embraced recently, in the aftermath of 9/11, by the Bush administration.
However, if there were flaws in the On Demand model, they were certainly exposed in the recent hurricane disaster that has devastated the coastal cities of Mississippi and Louisiana. And there are lessons to be learned. These lessons address a number of key questions: What happens when the underlying infrastructure for electronic technology fails? What happens to the delivery of services? What happens to the structure and efficiency of centralized control and command?
On Demand in a Disaster Recovery Scenario
In a business environment, an infrastructure failure in an On Demand environment merely causes a loss of business. In severe situations, it may even result in the ultimate failure of the enterprise. In the recent natural disaster, however, when the government relied upon On Demand technologies for collaboration and coordination, Katrina demonstrated that an infrastructure failure can and will cause loss of life, the demise of law and order, and the suffering of untold thousands. By comparison to the tasks facing FEMA in the aftermath of Katrina, IT disaster recovery planning seems terribly trivial. Yet, when we witness the result of FEMA, it's hard not to think that IT would have handled it all very differently.
This is not to criticize the On Demand computing model, nor to pretend that the physical disaster caused by Katrina might have been avoided. Such terrible disasters will occur regardless of the computing model that has been chosen to deliver disaster services.
But Katrina and its aftermath should be wake-up calls to remind FEMA administration that it must plant its feet back in the real world, and IT's experience with disaster recovery contingency planning may be an interesting starting point. Sure, the scope is significantly different, the stakes infinitely more important, and the tools needed more diverse. Still, every IT manager knows that it's the outcome that measures the success of a disaster recovery contingency plan, not how well the management is informed of the costs.
Centralized Control and Command
In Katrina's case, too often the centralized command and control of rescue operations actually prevented workers on the ground from accomplishing their missions. Too often, the On Demand services delivered did not meet the needs of the victims.
The media has, for better or worse, portrayed the resulting problems of New Orleans as simple bureaucratic mismanagement. Indeed, one cannot dismiss this conclusion. And yet, in many cases, it was the underlying goals of the government's On Demand model that also failed.
Why did it fail? Because their On Demand model never met the requirement of delivering positive outcomes for the victims. Instead, it was focused upon delivering information to the FEMA controllers--the "management" administrator--enabling them to exert micro-control over the delivery of services, despite the needs of rescue workers on the ground.
Controlling the Intelligence
Consider the fact that rescue workers at the state and local levels were never fully coordinated into a master disaster plan of action. Such contingency plans had been in place for the city of New Orleans in previous years. Unfortunately, funding for updating those contingencies was eliminated during the aftermath of 9/11 in the first years of the Bush administration. This lack of an up-to-date, comprehensive, regional contingency plan meant that rescue workers were forced to rely upon outdated instructions that were often geared to previous generations of electronic technology. Meanwhile, the federal government had implemented newer technologies to gather intelligence that no longer meshed with the older contingency plans.
Furthermore, consider that the new On Demand model implemented by FEMA did not integrate with the mechanisms of the Red Cross and local and municipal rescue teams. Instead, the services that were to be delivered were focused on obtaining information about the disaster while often merely delivering tokens of service to the victims.
In other words, they had centralized the model of delivering services but had not provided a redundant method for local officials to use those services.
Failing Delivery Services
Consider, as an example, that FEMA's centralized control of the emergency actually blockaded rescue workers in one parish from delivering water, food, and diesel fuel for runing generators because those resources were not registered in their information system.
Consider, as another example, that rescue workers were instructed to chart the coordinates of corpses floating in the streets of New Orleans with handheld radio GPS devices--devices that sent the information to a centralized FEMA database--but that these same rescue workers were prevented by FEMA from actually recovering the bodies. FEMA could find the bodies, but it could not deliver the vehicles or the personnel to recover them.
Consider that, despite the overwhelming amount of information that FEMA had about the risk of a storm like Katrina, about the certain knowledge that the levees would fail in the aftermath, about the estimated numbers of citizens who would have no transportation out of New Orleans, and about the masses of evacuees who were still seeking shelter at the New Orleans Convention Center, still no means seems to have existed for FEMA to actually effect a rapid and coordinated rescue.
In other words, the On Demand model delivered the information to FEMA very well; the agency was awash with information. Yet transforming that information into a multi-threaded response to achieve its primary mission of rescue seemed beyond its capabilities.
Acronyms for Failure
The acronym FEMA stands for Federal Emergency Management Agency. In the aftermath of Katrina, its name might well have been altered to Federal Emergency Information Management Agency, for certainly information management was the only task it achieved.
Yet every IT manager worth his salt knows that managing the information about a disaster--whether it be a DASD failure, a complete system meltdown, or a catastrophic collapse of the business infrastructure--is only the beginning of handling the calamity. In an IT disaster recovery scenario, delivering the means of recovering the environment is more important than knowing the extent of the damage. And here FEMA's On Demand service delivery model clearly failed.
Tokens of Service in a Disaster
For example, we have all heard of the pilot program that FEMA promoted during the initial days following Katrina: the distribution of ATM cards to the victims of the storm. The idea was that these citizens could each withdraw up to $2,000 from a government account to manage their immediate needs.
Although FEMA later acknowledged that this was only a pilot program not intended for every victim of the disaster, the more important point seems to have eluded the administrators. In a disaster of Katrina's magnitude, cash has little or no value, and ATM machines--if they are functioning at all--will quickly become emptied. Meanwhile, regardless of the availability of paper money or credit, food, water, shelter, and safety cannot be purchased at any price. A small piece of credit card plastic in such circumstances may be a flashy token of service, but it's certainly an absurdly fragile safety net.
The equivalent in the aftermath of an IT disaster might be restoring access to the Internet when the mission-critical applications are still down. It's a nice thing to have--maybe even important for some users--but it's not a measure of success. In the light of a real disaster, it's a measure of incompetence.
In Katrina's case, providing plastic credit to victims was a flashy service. But it wasn't the measure by which FEMA would be judged. Rescuing citizens in need and restoring a system of law and order along with basic mission-critical systems was!
The On Demand Delivery Mechanism
And yet, when one looks at the underlying service delivery model, delivering credit cards is exactly the sort of On Demand service mechanism IBM (and others) has pushed onto the federal government. After all, keeping people tied into the ether-economy through credit vouchers is what information technology companies are all about. Great concept in the abstract! Little value under the circumstances!
Don't get me wrong. I have the greatest respect for the brave teams of rescuers, admiration for the dedication of those attempting to manage the catastrophe on the ground, and even grudging respect for the long hours put in by administrators and workers on the FEMA team. However, there's an old adage: "When all you have is a hammer, all your problems look like nails." In this case, FEMA seemed reduced to relying completely upon the On Demand model for managing the disaster. As a result, though the agency and its administrators took a beating, it was the poor suffering victims of Katrina who got screwed.
Lessons of Contingency
By comparison, in our IT environments over the years, we have learned the lessons of disaster recovery through trial and error. Though individually we could never manage an event like the horrible aftermath of Katrina, it is still a wake-up call to examine our reliance upon new technologies in our contingency plans. What happens if they fail? What are the real outcomes that our management and our users require us to deliver? Is our contingency plan up to the challenge? Is it up-to-date with the technologies we are using? Can it withstand and deliver, even in the most critical of circumstances?
We long ago learned, by anticipating the unthinkable, that we should not rely solely upon a single technology to rescue us or to enable us to recover. We've learned that there is no substitute for a comprehensive disaster recovery plan, with trained people in place and alternate backup facilities available. And we expect to roll up our sleeves and get dirty in the process. Most importantly, we know we can't ignore the signs of impending disaster when we discover them, and we know we must think proactively about how we might recoup our losses after a disaster.
Because, without such a contingency plan, we in IT, like FEMA in New Orleans, will be reduced to praying for blue skies after Mother Earth has wrought her wrath.
Thomas M. Stockwell is Editor in Chief of MC Press Online, LP.
MC Press Online