In many ways, the information that keeps your organization running is as important as the money you put in the bank. That is why the disk devices which hold your information have to be as sound as the banks that keep your money. If a processor fails, you lose valuable time; if your disk drives fail, you stand a chance of losing much more. As a result, selecting the right disk technology for your installation is one of the most important hardware choices you will make.
Selecting disk drives, or direct access storage devices (DASD), is more than an important task; it's an expensive one as well. Over the life of your system, there's a good chance you will spend as much or even more on disk drives than on the system to which they are connected. AS/400 users in the United States will spend around $550 million on processors during 1993 and almost $650 million on storage devices. They will invest the lion's share of that $650 million on DASD.
Although IBM has cornered around 94 percent of the market for AS/400 DASD and more than 99 percent of the S/36 market (figures supplied by Computer Intelligence of La Jolla, California), this near-monopoly does not imply a shortage of competitive products. All of the vendors listed in our buyer's guide chart have fine reputations for quality and service. Before you can decide who will get your hard-won budget, you need to determine what type of DASD solution you need. Although we've only included AS/400 disk drives in the chart which accompanies this article, most vendors also offer S/36 and S/38 DASD solutions.
Critical Variables and Trade-offs
As you begin evaluating the disk products that are available on the market, you will discover two general truths about DASD. First, the best DASD purchase for your organization will not necessarily be the best purchase for other organizations. This is because each organization has different priorities in terms of disk capacity, transfer rates, system availability, data protection, performance and price.
The second truth you will discover is that every storage solution forces the buyer to accept some trade-offs between these priorities. You will need to determine which priorities are more important than others. As you make this determination, you should consider two critical variables: system availability and throughput.
The percentage of time the system is up and available-system availability-is important to all users. While many components affect system availability, disk drives have a major impact on this variable. In fact, a study of AS/400 sites by the Aberdeen Group of Boston, Massachusetts revealed that DASD failures caused 64 percent of all system crashes among survey respondents last year.
The degree to which storage devices maintain data integrity when a storage failure occurs has a direct effect on system availability. Most AS/400 sites use DASD that requires a full recovery of the system after a disk failure. After recovery from backup, further work must be done to restore objects, libraries and access paths to their status just prior to the failure. Options that decrease downtime after a failure include data protection schemes built into the DASD, such as disk mirroring, RAID, hot spares and redundant power supplies. (Each italicized item is defined in detail in the glossary on page 53.) OS/400 facilities such as disk mirroring (under software control), journaling, auxiliary storage pools and checksum can also improve data protection and system availability.
The level of throughput achieved by a disk storage solution may be difficult to obtain from vendors' literature. However, throughput is the most accurate measure of how the drive will actually perform. Throughput is impacted by hardware factors such as seek time, rotational latency and transfer rates, but data protection schemes such as checksum, mirroring and RAID also play an important role. In general, these efforts to improve system availability and data protection have a negative effect on performance because they use up CPU cycles or require additional reads and writes. They also take up valuable disk space, which raises the cost of storage.
It is relatively easy to create a low-cost storage solution on a system that does not use any data protection or system availability features beyond regular backups. However, these savings must be balanced against the cost of a data loss or system crash. It can be difficult to create an inexpensive storage solution that provides adequate protection against these dangers.
RAID: The Next Generation
As DASD buyers struggle in this balancing act, they will begin hearing more about a new "solution" to the problem: Redundant Arrays of Inexpensive Disks (RAID). RAID is actually a variety of techniques that use multiple disks to provide data protection via parity. Many new RAID products will hit the market in 1993, generating lots of marketing heat. Over the next few years, however, it is highly likely that AS/400 disk vendors will continue producing non-RAID products that outperform (or perform as well as) RAIDs. Therefore, improved performance is not the reason to get an AS/400 RAID; the real reasons are to protect your data and improve system availability. Some AS/400 shops use AS/400 mirroring or checksum for these very reasons. If you currently use these techniques or are considering their use, RAID could be a better alternative for you. If you are not using these techniques, RAID may not appeal to your needs or fit your budget.
Computer scientists from the University of California at Berkeley defined several different types of RAID drives in the early '80s. Three types of RAID are currently available for the AS/400. The first of these, RAID Level 1, ensures that the data on each drive is duplicated on another drive. RAID-1 drives handle all disk mirroring for the system, which reduces the performance impact of mirroring. However, they do not provide mirroring of controllers, I/O processors or buses.
RAID Level 3 protects data via the same parity information used by OS/400's checksum feature. In RAID-3, the parity data is written to a specific disk that handles this function for the entire array. RAID-3 is an effective method for transferring large blocks of data, though inefficient for the small blocks that are read by most AS/400 applications.
RAID Level 5 technology stripes parity data across all disks in the array. The drives are also asynchronous, allowing each drive to read and write independently of the others. Because this is a more effective design for reading smaller blocks of data, RAID-5 is better suited for most AS/400 applications than RAID-3.
When compared to RAID-1, RAID-5 is far less costly because data does not have to be mirrored. However, RAID-5 will not perform as well as RAID-1 using the same disk technology. This is because each write to a RAID-5 device actually requires two reads and two writes in order to update parity data. This write penalty is a problem with all parity schemes, including the OS/400's checksum facility.
As you decide whether or not RAID should be a part of your future, it is important that you know what RAID can and cannot do for your installation. For instance, while RAID outperforms older disk devices, the newest non-RAID devices are based on the same high-performance disks as those used for RAID devices. When not encumbered with the need to read and write parity data, they can frequently transfer data faster than a comparable RAID drive. The exception is RAID-1, which uses mirroring rather than parity for data protection.
Making a Decision
Because midrange customers have different priorities, the available storage solutions stress different capabilities. As 1 indicates, storage solutions that excel in some areas do not fare as well in others. For instance, RAID-1 and system-level disk mirroring provide high levels of data protection and system availability along with reasonable performance, but are very expensive. The most recent standard DASD products, on the other hand, are inexpensive and offer high levels of throughput, but are at the bottom of the pile when it comes to data protection and availability. To improve data protection and availability on standard DASD, some vendors offer hot spares. However, this increases storage costs.
Because midrange customers have different priorities, the available storage solutions stress different capabilities. As Figure 1 indicates, storage solutions that excel in some areas do not fare as well in others. For instance, RAID-1 and system-level disk mirroring provide high levels of data protection and system availability along with reasonable performance, but are very expensive. The most recent standard DASD products, on the other hand, are inexpensive and offer high levels of throughput, but are at the bottom of the pile when it comes to data protection and availability. To improve data protection and availability on standard DASD, some vendors offer hot spares. However, this increases storage costs.
One way to strike a better balance between these critical variables on the AS/400 is through the use of Auxiliary Storage Pools (ASPs). This feature lets you divide disk storage into as many as 16 pools, which can then be treated as individual entities for the purposes of checksum and system-level mirroring. When a disk fails in a system using ASPs, you only need to recover the data in that pool to bring the system back up. This improves system availability considerably. ASPs can also reduce the cost of data protection in both dollars and lost performance, as they let you apply protection techniques only to those pools that contain mission-critical libraries.
While ASPs make the choices between storage solutions less difficult, they cannot eliminate the trade-offs that must be made between critical variables. As part of the decision process, you may want to take a closer look at the data you consider to be mission-critical. What would it cost your company in lost business opportunities, system downtime and recovery expenses if that information was lost or damaged? The answer to this question may tell you to what extent you should trade off low storage prices for higher data protection and availability.
Looking Down the Road
Anyone who remembers the storage units of a decade ago knows that disk technology changes rapidly. Just as yesterday's drives look like washing machines compared to today's sleek cabinets, so will tomorrow's DASD make current technology cumbersome and obsolete. Within two to three years, expect to see arrays housing drives with diameters of two inches or less, containing three to four times the data in the same physical space as today's newest models. These units will handle parity writing efficiently, making RAID protection almost as affordable as storing data without RAID.
In the meantime, there are still some distinct trade-offs in price, performance and reliability between a wide variety of different storage technologies. However, with a little ingenuity and shrewd bargaining, you can use this variety to get the right disk devices for you at a price that's right.
Auxiliary Storage Pools (ASPs) - An OS/400 facility which lets the user divide disk storage into up to 16 pools.
Checksum - A software data protection implementation which groups disks into sets of two to eight disks, then spreads parity information across the disks in the set. In case of a disk failure, the system uses the checksum algorithm to recreate the data from the failed disk.
Disk Mirroring - The duplication of the contents of one or more disks on a second group of disks. The AS/400 has built-in software facilities to mirror disk devices as well as disk controllers, I/O processors and system buses.
Formatted Capacity - The capacity of a disk device, usually stated in megabytes (MB) or gigabytes (GB), after the disks are formatted. This capacity is usually smaller than the unformatted capacity, which vendors may also use in their marketing literature.
Head-Disk Assembly (HDA) - A complete unit consisting of disk platters, actuators, read/write heads and the electronics needed to connect the unit to a controller. Most midrange disk models contain many HDAs built into a cabinet that fits into IBM's standard 9309 rack.
Hot Spares (also called Hot-Pluggable Replacements) - A data protection and recovery technique in which a head-disk assembly is reserved as a spare within an array of drives. If one of the drives in the array shows signs of an impending failure, the data on the failing drive can be written to the spare, which then becomes active. Hot spares are only as good as the array's ability to predict a failure.
Redundant Arrays of Inexpensive Disks (RAID) - A data redundancy method which combines an array of disks with data protection techniques built into the disk unit.
Mean Time Between Failure (MTBF) - The average amount of time a component works without failure.
Redundant Power Supplies - A method of protecting a disk array from going down by adding reserve power modules to the unit. The reserve modules can take over in case one of the primary modules fails.
Rotational Latency - The average time a read/write head must spend waiting for data after the head is positioned on the correct track.
Seek Time - The total time required for a disk device to access data. While most vendors give the average seek time, some cite minimum seek times, so make sure you know what you are being quoted.
Decision Data One Progress Ave. Horsham, PA 19044 800-933-9897 fax: 215-674-9543 Circle 150 on Reader Card
EMC Corp. 171 South St. Hopkinton, MA 01748 508-435-1000 fax: 508-435-8900 Circle 151 on Reader Card
IBM Corp. Contact your local branch
IPL Systems, Inc. 60 Hickory Dr. Waltham, MA 02154 617-890-6620 fax: 617-890-0059 Circle 152 on Reader Card
Memorex Telex Corp. Mid-Range Systems Group 5800 Campus Circle Irving, TX 75063 214-580-7500 fax: 214-580-8266 Circle 153 on Reader Card
XL/Datacomp, Inc. 908 N. Elm St. Hinsdale, IL 60521 800-323-3289 fax: 708-323-2104 Circle 154 on Reader Card
Buyer's Guide: Midrange DASD
Figure 1 Report card on IBM midrange storage strategies
Figure 1: A Report Card on IBM Midrange Storage Strategies System Data Availability Protection Performance Cost Dual Mirrored Systems A A B- D System-Level Disk Mirroring A- A- B- C- RAID-1 B+ A- B C- RAID-5 B B+ B- B Checksum C B+ C B DASD with Hot Spares B- B- B+ B DASD without Hot Spares D C B+ A