Have you ever been approached by a user with a data archive mystery? The user has been a trusted employee with the company for years, and he just wants to retrieve the information in an old file he has on his hard drive. It might be a word processing document in an old format, a spreadsheet document from a previous version of software, or even some figures associated with a once-popular PC database that is no longer supported. But it's important to him.
The Diskette Search
Fortunately, you are able to find the old install disks stashed in the storage room, and you reinstall the software without much difficulty. However, the program itself will not run, providing you only with a cryptic message about a missing file or module. Hmm...a mystery! Just what you were looking for on Monday morning!
Further investigation reveals that the software just will not run on the most recent version of the PC operating system, so you dutifully backtrack your steps until you've loaded DOS from a five-inch diskette on the only remaining machine that has such a diskette drive. Wow! How old is this data?
This too leads to problems, however, because the version of DOS that the program requires can't recognize the size of the 30 GB hard drive upon which the data is stored. In other words, the hard drive's capacity is beyond the capability of the operating system to read. Hmm! Time to dig a little deeper!
You fiddle around for awhile until, at last, you've reconstructed a software/operating system environment that roughly duplicates what is required. But alas, you discover that this also leads to a dead end. Some data briefly ghosts by onto the screen, but then the program crashes with yet another cryptic error message.
You turn to your co-worker and ask, "How important is this data really?"
"Really important!" he replies.
After searching for hours on support networks on the Internet, you discover that the program will probably never run. Why? Because the program was created using an obsolete compiler that timed its calculations to the speed of a specific Intel coprocessor that, unfortunately, has not been manufactured in over seven years.
Reluctantly, you turn to your co-worker and tell him the news.
What's in the file? It's an electronic version of a financial transaction that happened seven years ago. The auditors want to inspect some figures.
Permanence: Such a Flighty Thing!
As absurd as this sounds, it is a real scenario that is becoming quite common in 2005. Programs that ran flawlessly only two or three years ago are now rapidly becoming obsolete, and the actual quantity of important data that is held captive by these systems is an unknown. DOS, Windows 3, Windows 95, Windows 98, and Windows ME are no longer supported by Microsoft. Soon Windows 2000 will be tossed on the trash heap of obsolescence. Such is the nature of progress, we are told.
Yet new compliance legislation such as HIPAA and Sarbanes-Oxley (SOX) has some specific requirements for preserving data for long periods of time. Moreover, with new data retention policies, SOX in particular will necessitate the storage of more records, causing many corporations to completely reevaluate their storage management systems. To make matters worse, any and all electronic records are subject to SOX requirements, including email and Instant Messaging (IM) files. And while the new regulations stipulate the use of some data storage technology that cannot be overwritten or altered in any fashion, there is growing concern that such technology is not yet truly trustworthy.
IT has assumed that SOX rules call for the implementation of write-once-read-many (WORM) devices, but recent investigations by IBM researchers are demonstrating that simple WORM technology doesn't fill the bill.
In a yet-to-be published IBM white paper entitled "Fossilization: A Process for Establishing Truly Trustworthy Records," IBM researchers Windsor W. Hsu and Shauchi Ong have been analyzing the question of how to define the elements of a permanent system of data recording. They have come to the conclusion that WORM technology, by itself, is not sufficient to meet the requirements of establishing truly trustworthy records. This is because the interfaces and mechanisms that control the WORM technology itself can be spoofed with hardware and software filters that misrepresent actual data. This concern echoes the concern that some technologists had over the use of proprietary voting machines in the last Presidential election: The technology doesn't provide a true audit trail.
"What is really needed," the IBM authors state, "is a process we call fossilization--a holistic approach to storing and managing records that ensures that they are trustworthy."
According to the authors, fossilization is composed of three parts:
- Fossilization of storage--Guaranteeing that all records and their associated metadata are reliably stored and securely protected from any modification
- Fossilization of discovery--Ensuring that all preserved records pertinent to an inquiry can be quickly discovered and retrieved
- Fossilization of delivery--Warranting that the exact pertinent records are delivered to the agent and that the records are delivered in an intact form
In other words, fossilization is not merely a medium for recording information, but a secure process for recording, protecting, and retrieving information that is repeatable throughout the lifecycle of the information. Like the natural processes that create organic fossils, IT needs a process by which it can "lay down" records into a permanent strata that is immutable, yet can be retrieved well beyond the technological era that created them.
In this concept of fossilization then, to be permanent and immutable goes beyond simple WORM technology; it strikes at the heart of what our IT culture has evolved into. It's about building a technology that is immutable to fraud.
If we look at our data from the perspective of immutability, we can quickly see the outlines of our current record-keeping problem. We live and work in a culture in which our corporations float amid a sea of electronic and magnetized bits that are constantly tallying and scoring the wealth and well-being of our enterprises. These electronic bits must be constantly refreshed, yet they're so delicate that they can be obliterated or hacked and altered with the skill of a high school graduate. The magnetic records of those bits--stored on hard drives--are too easily manipulated. Even the optical storage of these bits can be misrepresented during the recording or the retrieval of the information, through filters or the use of meta data.
Consider the crime of electronic identity theft. It's a process by which a few megabytes of data are electronically manipulated by a criminal to transfer the wealth of one individual or corporation invisibly to another. The concept of "permanent records" in such a virtual electronic world is completely archaic today. These records are a myth. Everything is in transformation: our data, our processes, and even our technologies that do the measuring and the digitizing.
Change and the Limitations It Imposes
What is often missed in the analysis of such crimes as identity theft is that the measurements of our entire social and financial structures have been transformed in the last 20 years into a pile of vibrating, charged electrons that have little or no permanence at all. Our well-being and the well-being of our companies are being calibrated by an unbounded technology in constant change. And although the scandals of Enron and WorldCom taught us about the cost of fraud and manipulation of data, the lesson that we might need to learn is that we are entrusting our vital records and the wealth of our institutions to technologies that are barely older than the transactions that they are recording.
The Challenge of Archiving
This brings us back to the challenges of archiving. Technologists are coming to terms with the reality that the need for permanent and immutable record-keeping is being undermined by the technologies that too rapidly evolve. Like our co-worker with the obsolete software who is looking to restore his precious data, unless we come to terms with the rate of technological innovation, we'll continue to erase our historical records with each new advancement. And while we can't keep one generation of technology from retiring the previous, it's becoming increasingly clear that we need some means to transfix records, like fossils, to preserve the value of what has been recorded.
Thomas M. Stockwell is Editor in Chief of MC Press Online, LP.