The Linux Letter: The Open-Source Storage Management Option

Linux / Open Source
  • Smaller Small Medium Big Bigger
  • Default Helvetica Segoe Georgia Times


I participate and lurk in a significant number of technical mailing lists. Lately, I have noticed an uptick in the number of queries that seem to ask the same question: "Is there an enterprise-grade storage management package that is open-source and has all of the features of x (where x is a commercially available package)?" I don't know what the impetus of this interest might be, but the short answer to this question is simple: no. That's not to say that there aren't any quality entries in the data backup and restore category (a subset of a true storage management package), but in terms of something like IBM's Tivoli Storage Manager (TSM), the truth is that nothing in the open-source community comes close to the depth and flexibility that TSM provides. Of course, there is nothing in the open-source community that comes close to the expense, requirements, and complexity of TSM, either. That said, you may find that an open-source alternative will serve your needs just fine.

Even if you've already deployed TSM, you should still examine the open-source entries. Some can fulfill the needs of niche systems well. Better still, these open-source products can be deployed without concern for licensing and associated fees, so you can readily add additional protection from data loss to servers and workstations that may or may not already be protected by a commercial product. You can even get creative and lower your license counts by using an open-source product to accumulate data from many machines into one, where your commercial product can pick it up.

The open-source alternatives vary in capabilities and complexity—from very simple to very complex, even approaching that of TSM. There are packages that take advantage of the tools that come with any Linux distribution (and port to other operating systems) as well as more-sophisticated packages that can create save volumes on multiple tapes and disk units and do cross-platform backups. Let's look at some of the contenders.

Simple, Yet Powerful

An example of a simple yet powerful package is rsnapshot. Written in Perl, it leverages the rsync utility to create a sweet little backup-to-disk program that is capable of saving files from your Linux and UNIX servers into point-in-time volumes (directories). Rsnapshot takes advantage of hard links (multiple directory pointers to the same data) to give the appearance of multiple full backups yet requires only enough disk capacity to store the complete data set plus any changed files. You can have the illusion of full hourly, daily, weekly, and monthly backups without having the physical space to hold that many copies. I use rsnapshot on my laptop in conjunction with a USB drive to keep my project data saved, but I could just as easily save my data across a network to another system or have my laptop polled by an rsnapshot instance from another system.

I have a system I built just to be a backup storage server that polls many of my clients' systems to provide them with off-site backups. I can easily do this over the Internet because rsync, the "man behind the curtain," is itself quite clever. It detects byte-level changes in large files and sends only the changes across the network. So if you have a huge file to save, you need to haul it back in its entirety only once. Subsequent saves of the file will require only the bandwidth needed to bring the changed data.

Another benefit is that rsync is cross-platform, so it isn't constrained to *nix systems. I have one client (a dentist) who has a digital dental system hosted on a Windows 2000 server. He brings his Windows XP laptop in daily, connects it to his network, and then runs a command file I wrote that syncs the database and images from his server to his laptop. At the end of the day, the laptop comes home with him, where it functions both as an off-site backup and as a way for him to look up patient information when he gets an emergency call. Not bad for software that costs nothing!

A Dedicated Backup Server

Moving up the line in complexity, we'll now turn our attention to another jewel: BackupPC. This package allows you to set up a server that can be configured to save data not only from *nix machines, but from Windows machines as well. Best of all, it does this without the necessity of installing any client software. While I have not as yet deployed a BackupPC machine, it is on my list to replace the rsnapshot server I mentioned earlier.

I have seen this package in action, and it's quite remarkable. To start with, the interface is Web-based, so it's easy to configure. The software provides system status information and full reporting services so you can tell at a glance which machines have been backed up and which ones are lacking. You can have the software notify users that their machines are not backed up and even have the software "snipe" machines that connect infrequently. (By that, I mean you can have it continue to attempt to contact a machine until it successfully does, at which time it will do the backup.) Like rsnapshot, BackupPC utilizes hard links but goes one step further: It checks to see if a file is common to more than one PC. If it is, it doesn't keep multiple copies (one from each machine) but, instead, has the machines share one copy. That really minimizes storage requirements!

Perhaps the best feature of BackupPC is the fact that users can restore their own files. You can configure security such that users can access their files and, using the browser interface, have files delivered in either raw form (as they came from their system) or encapsulated in a zip or tar file. This feature helps to eliminate your need to get involved in restoring data on your users' behalf. That in itself makes BackupPC worth serious consideration.

Closer to Enterprise Grade

If both rsnapshot and BackupPC are insufficient for your needs, I suggest you look at either AMANDA (The Advanced Maryland Automatic Network Disk Archiver) or Bacula (The Network Backup Solution). Both of these solutions are closer to what most people consider to be "enterprise-grade."

AMANDA was developed at the University of Maryland at College Park in the late '90s. Originally designed for saving data from UNIX systems, it has been enhanced by its user community to be able to save data from Windows-based systems, via Samba. AMANDA is capable of using a diverse group of tape drives, including tape libraries, and will stage data on its local hard drive(s) so that it can stream data at the maximum speed of the interface or drive, whichever is slower. Perhaps AMANDA's biggest flaw is in its support of Windows machines. Since AMANDA communicates with the machines via Samba, it isn't privy to the security information stored on the local Windows box. As a result, restoring files can be a bit more problematic.

Bacula has all of the features of AMANDA, but instead of relying on Windows file-sharing to access the files it has been configured to save, it utilizes client-side software. Thus it has complete access to the file attributes and can save those with the file. As a result, restoring data is a more transparent and painless process.

Both of these products have proven to be quite capable for the small and medium-size businesses, and both have been in use long enough to have solid reputations.

Fill in the Gaps of Service

As IT managers, we tend to think only of the commercial solutions to our needs. Since there are no advertising dollars coming from open-source products, the trade magazines don't often mention or review them. That's fine, as that's business.

Certainly, using an open-source solution for your storage management needs includes the expense of getting it installed, configured, and working properly. But you have those same costs with the commercial products as well. Once you have the first box deployed, subsequent installs go quickly, lowering the time investment. And that's true for both open-source and commercial products. The difference is, the open-source products don't have the per-seat or per-machine costs associated with them that the commercial products have. Want support? You can buy that for the open-source products just like you can for the commercial products, but you don't need to have a current contract to get updates for the open-source products.

The choice between open source and commercial is not an all-or-nothing proposition. I encourage you to visit the Web sites of the products I mentioned and do a bit of reading. Perhaps you can save yourself some money and improve your recovery chances by employing both types of products.

That's it for this month. I wish everyone in the United States a happy Thanksgiving. See you next month!


Barry L. Kline is a consultant and has been developing software on various DEC and IBM midrange platforms for over 23 years. Barry discovered Linux back in the days when it was necessary to download diskette images and source code from the Internet. Since then, he has installed Linux on hundreds of machines, where it functions as servers and workstations in iSeries and Windows networks. He co-authored the book Understanding Linux Web Hosting with Don Denoncourt. Barry can be reached at This email address is being protected from spambots. You need JavaScript enabled to view it..