|Rethinking Data Protection|
|System Administration - High Availability / Disaster Recovery|
|Written by Andrew Winkler|
|Sunday, 13 July 2008 19:00|
Can your storage provider guarantee that your data won't be lost?
Lose data, lose your job. Since 93 percent of data-intensive businesses that lose their data go out of business, it's not surprising that data loss is such a big career killer. And it's only natural that we have built a massive infrastructure around preserving data at all costs: raid arrays and tape libraries and snapshot file systems and offsite storage and online storage.... So massive, in fact, that it is collapsing under its own weight.
Our storage architectures today reflect a fundamental reality of yesterday, when all data was incredibly valuable. To illustrate, when I started computing in 1973, it was on a system with 32K of memory that cost around $100,000. At these prices, which in those days were considered great bargains, companies could only afford to store massively valuable data. Today, that kind of money buys terabytes of fast memory and exabytes of disk space. And just as the acid test of writing in the physical world is that it is "worth the paper it is printed on," it makes sense in the digital world to store only data that is worth more than the cost of the disk space it is stored on plus all of the other overhead costs. These costs include systems to hold the disks, administrators to manage the systems, and support and backup infrastructure for all of them.
On the other hand, to fail to save any data that is more valuable than the cost of storing it is throwing away money. In summary, if its value per byte is bigger than the cost per byte of storing it, you keep it. If it is less, it was never economical for you to put it on disk in the first place. The maximum value per byte of information we store, however, is unaffected by these costs. An archive that is worth a million dollars is worth a million dollars whether you put it on a flash drive, or burn it on a CD, or copy it to tape. The result of this ever-decreasing cost of disk space is that the ratio between the most valuable data that we possess to the least valuable data that we choose to store increases just as fast as the cost is decreasing. This is true even if any one specific piece of datum is decreasing in value; its place is continually taken by some new piece of information that is just as valuable as the old was.
This central reality is a direct consequence of Moore's law, which describes the fourfold transistor density increases occurring in three-year cycles experienced in the last decades. If our storage cost drops by a factor of four, then the gulf between lowest and highest values expressed as a ratio is also increasing by a factor of four. Our data storage architectures were designed for data a million times more valuable than our least valuable data today. And as disks get cheaper, we spend more total money, not less. Our total capacity increases by at least that factor of four. This additional threefold capacity fills up with data that we chose not to store at the fourfold higher price, suggesting that the bulk of our data generally has value at the lower threshold of our data values. So the bulk of our data, our least valuable data, is protected with an architecture designed for our most valuable data. This is massively inefficient.
Either we inadequately protect items of great value, risking catastrophic loss, or we overprotect items of lesser value. This results in wasted money and a loss of valuable information that we might have been able to afford to keep but did not.
In the physical world, we solved these problems a long time ago. For mountains of gold, there was the proverbial Fort Knox, literally protected by an army. For us mere mortals, there is the bank vault. If you can't afford to keep it all there, maybe you have a safe or a locked cabinet. Less valuable things go to the attic, then the garage, or perhaps a shed. Some things you don't mind just leaving in the yard. For the things we can't afford to lose--our house, our car, or our health--we buy insurance. For a fee, someone else agrees to suffer for us the financial consequences of loss and make us whole. In other words, we don't just manage these risks by installing sprinklers, driving carefully, and eating healthfully; we also transfer the risk.
What makes that so significant is that it provides an economic mechanism for rationalizing costs. If you could not insure your house, how much would you have to spend and how hard would you have to work to be absolutely sure that it would never burn down? If you spent too little, you might not know that until it's too late. How would you ever know that you were spending too much?
So why not buy insurance for data? Traditionally, both insurance that covers property loss and comprehensive general liability (CGL) insurance form the first line of defense for businesses against unexpected financial loss. However, property loss insurance for data is not available anywhere. While business continuity insurance is available and may compensate you for some downtime while you attempt to recover from a disaster, it is just a stop-gap. Some policies may pay for the cost of attempting to recover lost data, but if you are not successful, you could end up with a room full of replacement systems devoid of data.
At first sight, it might seem a no-brainer for an insurance company to offer insurance that covers loss for the value of data. A moment's thought, however, exposes the dangers that face an insurance company taking that step. The term used by insurance companies to describe these dangers is moral hazard. The moment an insurance company issues an insurance policy to a restaurant, for example, the probability of a fire occurring instantly rises, even if the restaurant owner is perfectly honest and would never even think of torching his business deliberately. The mere fact that you have insurance lets you relax your guard. You buy insurance so that you can sleep at night, but all the time you spend sleeping is time you are not spending being hyper-vigilant.
All of these same moral hazard issues apply to data with a crucial additional facet: Data is easy to copy and easy to hide. There is no way to torch the restaurant while hiding a copy of it somewhere else. For the most part, insurance companies do not regard data as tangible property, and only loss or damage to tangible property can be covered under traditional insurance policies.
Some companies have attempted to use CGL insurance to cover their losses when they have been sued by third parties for data loss or have been harmed by downtime. However, here again, most insurance companies do not see data loss as a physical loss or as damage to tangible property, and thus these kinds of losses are not covered under a CGL policy either. In general, the courts have upheld this interpretation, saying that computer data is not physical or tangible property because it cannot be "touched, held, or sensed by the human mind."
Some insurers are now specifically writing into their CGL policies data loss exclusions; others will offer some kinds of liability insurance for data loss for an extra premium but are careful to make many stipulations regarding all of the security measures that must be in place for a claim to be valid. But even if you can meet all of the onerous requirements of a CGL policy covering data liability, buying it to try to protect yourself against data loss is like having only liability insurance for your car; you are protected against the damage claims of others but not for the loss and damage to your own car.
For the value of your data itself, companies have had no alternative but to self-insure. Recently popular additional tools for managing the risk of self-insuring include offsite tape storage and online data storage. For example, Iron Mountain, in addition to cleaning out your closets and hauling away boxes of paper for storage, will happily take your tapes too. More recently, they have been purchasing online storage businesses as well. Online storage solutions geared for business data generally range in price from about $2 per GB to $10 per GB. Symantec offers up to 10GB for just $9.99 per month, but like many cell phone plans, their rate almost doubles if you go over the purchased capacity in any given month.
The advantage of storing tapes offsite is the power of independence. Your building could be hit by a hurricane as could the tape warehouse, but if they are far enough apart, chances are they won't both be hit at the same time. That's how independence works. The virtue of tape is that it is relatively portable, but the disadvantages are significant. Tape is a devilishly fragile medium that tends to lose information over time and can be foiled by the normal manufacturing variations between apparently identical drives. A tape written on one drive may be unreadable on another virtually identical drive.
The logistics of handling physical media create additional nightmares. Trucks get stolen, tapes get lost or mislabeled, tapes take too long to come back, and too often when they do get returned, they are not the tapes you wanted.
Moreover, the critical independence you were trying to buy may be an illusion. Some companies that suffered losses in the World Trade Center discovered that their offsite storage was actually located in the same building complex.
The advantages of online storage may include faster recovery times, greater geographical separation, and significantly greater convenience. You transfer critical files over the Internet to a remote, hopefully well-run and well-protected site. The cost of online storage has been dropping.
Carbonite, for example, aims for the consumer market with a flat rate for all you can back up. Consumer-oriented services achieve a low cost structure by extensive use of de-duplication technologies. There is an enormous amount of duplication among consumer PCs all running the same operating system, having the same software, and gathering the same collections of pirated music and pornography. There is no point in having a million copies of each, one for each of your million customers.
The rub is that this trick won't work if files are encrypted. So your file may be encrypted while it is being uploaded, and it may be encrypted again when it is finally stored, but is it ever unencrypted once it leaves your hands? If so, that is no solution for business data.
Despite the deficiencies, these approaches are widely used and useful. While data losses can and do occur, your disaster recovery chances are better with these tools than
without them. The key defect in employing these approaches, however, becomes visible in the light of the insurance analogy. While they transfer data, they don't transfer risk. This is highlighted by examining a typical contract for an online storage company. They are not responsible for losing your data even if it was the result of their own negligence. You could pay them to store some archive for years; if they could not return it, they might refund a month or two of the years of fees you have been paying.
The best approach is to transfer risk and data together. Transferring risk without transferring the data is what an insurance company would go out of business trying to offer. Transferring the data without transferring the risk is what offsite and online storage companies offer, which leaves you holding the bag when data losses do occur. In this approach, you transfer the risk by declaring the value of your data when you send it. You receive back a signed certificate guaranteeing the storage of that particular file for that particular value. Should you be unable to retrieve your file, you get a check for the value you placed on it.
You buy as much or as little protection as that particular archive merits, starting at pennies per month, with a charge based on value. This lets you choose the cost of preserving each of your files. An infinitely scalable storage architecture automatically creates a protection level commensurate with the value.
In addition to solving the immediate problem of letting you offload completely your data loss worries, this approach enables a new foundation upon which a much leaner, more adaptive IT architecture will be built. Information can now be preserved according to its value, and the overhead of our most valuable data need no longer limit or restrain our capacity for more rapidly embracing vaster quantities of newly affordable information.
|Last Updated on Friday, 11 July 2008 05:25|