The Linux Letter: Open-Source Enterprise Databases

Linux / Open Source
Typography
  • Smaller Small Medium Big Bigger
  • Default Helvetica Segoe Georgia Times

As a long-time i5 bigot, my DBMS of choice is DB2. While I have read one anecdote concerning DB2 and corrupted data, my personal experience has been that DB2 is rock-solid, high-performance, SQL standards-compliant, and chock-full of interesting and useful extensions. There is no doubt about it; IBM has a superb product with DB2.

There are times, however, when an iSeries is nowhere in sight, let alone in-network, so an i5-hosted DBMS is not an option. In such cases, what DBMS do you use? The most common options include DB2 on Linux, UNIX, or Windows; Oracle on Linux, UNIX, or Windows, or Microsoft's SQL Server on the only platform on which it is supported: Windows. Any of these can products would be satisfactory. But what if you have the additional constraint of cost? Do you have to sacrifice enterprise quality and features because you can't afford them?

The Open-Source Alternatives

Open-source DBMS products have been reliably providing the data store for the Web's content management systems (such as phpBB and PostNuke) for many years. Given the number of Web sites using a CMS, there must be a phenomenal amount of data under open-source DBMS stewardship. In spite of the success in the arena, or maybe because of it, most programmers don't consider open-source DBMS for "serious" projects, but instead, look at these packages as toys. That impression might have been true years ago. But it is certainly not the case now.

The two most well-known open-source DBMS packages are MySQL and PostgreSQL, the former having the most name recognition. There are other very good ones from which to choose, but these are the two primary examples. I'll talk about MySQL in next month's article, when I give the management view of LAMP (where MySQL is represented by the M). This month, I'll talk about my favorite of the two: PostgreSQL.

What Is Enterprise-Ready?

Before introducing PostgreSQL, let's first define "enterprise-ready." People have differing definitions as to what constitutes an enterprise-quality database. Since we're all familiar with our i5 DB2 DBMS, I think I'm safe in assuming that we can all agree that DB2 for i5/OS is enterprise-ready. Thus, I'll use it as an example of what features constitute an enterprise-ready database.

The primary attribute a DBMS must possess if it wants to be called "enterprise-ready" is that it conforms to the ACID model, which specifies that the DBMS must ensure atomicity (if one part of a transaction fails, the entire transaction is rolled back), consistency (only valid data can be written into the database), isolation (concurrent transactions cannot affect each other's execution), and durability (the DBMS has some kind of facility to recover from unexpected hardware or software failure). DB2 certainly meets this requirement.

An enterprise DBMS should be scalable, offering excellent performance whether serving a single process or a horde of them. And it must do well with databases that vary from the "almost too small to use a database" up to the terabyte monsters. DB2 qualifies on this count.

An enterprise DBMS should have an extensive set of APIs that an application can utilize to interact with the database. DB2 has a plethora of them, including the traditional native access we've all used with RPG or COBOL.

Other advanced enterprise DBMS features that are part of DB2 include triggers, views, inheritance, sequences, stored procedures, cursors, user-defined data types, and two-phase commit. Certainly, you might add other things to the list, but any DBMS that embodies these can no doubt be called "enterprise-ready."

Eliminate the Suspense

Now that we've defined our parameters, let's introduce this month's subject: PostgreSQL (PG). This DBMS is no babe in the woods. It is the result of 20 years of development. Thus, you can rest easy if you're concerned about using software that is not quite mature (and who isn't?). As to PG's qualifications for membership in the elite DBMS club, I'm sure that you have already guessed what I'm going to say, but if not, let me eliminate the suspense. PostgreSQL has all of the features I outlined in the last section. Thus, by our definition (assuming that you are amenable to our definition), it is "enterprise-ready." An executive summary of PG's features can be found on its Web site.

Of course, most IT guys are a conservative lot, unwilling to lead the pack. If you are concerned that adopting PG might be too risky, check out the PostgreSQL support contracts page, where you will find a list of some of the companies who use the product. Included in the list are such lightweights as the American Chemical Society, BASF, Cisco, and the U.S. General Services Agency (GSA). So if you're taking a risk using PG, you'll be in good company.

Oh, and one other thing: The 8.x series of PG runs natively on Windows! Earlier versions required the Cygwin environment to run, and the performance was terrible. This latest iteration performs very well, so if you are disinclined to use PG because you think that you need to set a Linux box to do so, think again.

Get the Latest

To get access to the full feature-set of PG, you'll need to have the latest version, which, at the time of this writing, is Version 8.1.2. If you are going to load this on a Windows platform, you can disregard the rest of this section. If you're running commercial Linux, you have some decisions to make because the "Big Two" enterprise Linux distributions, Red Hat and SuSE, offer only Version 7.4.x with their latest releases. This leaves you with two options.

Option one is to forgo the latest and greatest features—such as 2-phase commit and IN/OUT parameters in functions—so that you can use a vendor-blessed version of the DBMS. This makes updates to DBMS a natural extension of the vendor's software update system, creating you no extra work. To be truthful, you aren't giving up much in the features department should you decide to go this route.

Option two would be the natural choice if you do need the all of the features that the current incarnation of PG offers or if you're a compulsive tinkerer. Be aware that if you choose this path, you've lost the official sanction of the company that provided your distribution, and should you find yourself having trouble, you've lost access to any help from them. This caveat can safely be ignored if you don't have a paid support contract (you aren't going to get support in this case).

To get the 8.x version of PG, head on over to the binaries download site and select the appropriate directory: linux or win32. Red Hat/Fedora users can download RPM packages with which they can install PG. SuSE users may be able to use the Red Hat packages, but if not, the source RPMs are available too.

Not using Red Hat or SuSE? If you happen to be using CentOS 4, you're still in luck. The developers have created RPMs that integrate nicely into their distribution. (I'll show you how to retrieve it later). If you can't find any binary suitable for your Linux distro, then there's always "Use the force. Compile the source!"

Simple to Install

PostgreSQL is simple to install, thanks to the efforts of the PostgreSQL team and contributors.

If you are using Windows, then use the MSI package from the PG Web site. The installation uses the standard "wizard" program, so you'll find it simple to choose. The default installation delivers a feature-complete package. During the installation you'll have the opportunity to create a service account under which the PostgreSQL service will run. Also, you will be asked if you want to allow connections other networked machines. If you choose to disallow that initially you can change it later. In other words, don't worry about getting the installation exactly right the first time. The configuration is stored as a text file (not as a registry entry) so it is easier to alter it later on. Just look for the "configuration files" in the PostgreSQL 8.1 Start Menu item. Your initial interface to the DBMS will be through the pgAdmin tool (Figure 1), also available in the Start Menu.

http://www.mcpressonline.com/articles/images/2002/OpenSourceEnterpriseDatabasesV4--02080600.jpg

Figure 1: This is the GUI interface to PostgreSQL, running on Windows XP. (Click image to enlarge.)

 

Linux users have a greater number of options for their installation method, including building PG from the source (which is way beyond the scope of this article). In the case of RPM-based distributions (such as Red Hat, SuSE, and their clones) you will want to install the client and server packages, the documentation, and the JDBC drivers (assuming that you do Java development). On Red Hat, you'd use the up2date program. On SuSE, it would be YaST, and on the clones (e.g., CentOS) it would be yum. The packages are postgresql, postgresql-server, postgresql-docs, and postgresql-jdbc. If you need them, there are also packages that allow access to the DBMS via PHP, Perl, and ODBC. As an example, using yum, you'd issue this command:

yum install postgresql postgresql-server postgresql-docs postgresql-jdbc 

Earlier, I mentioned that CentOS provides PG 8. To get it, you need to edit the file /etc/yum.repos.d/CentOS-Base.repo. Search for this line: [testing]. This is the stanza that defines the repository name. Below it, you should find a line that starts with "baseurl." Change it to this:

baseurl=http://dev.centos.org/centos/$releasever/testing/$basearch/

Also, to avoid problems with package signing, I changed the gpgcheck=1 to gpgcheck=0. While this may not be recommended for normal use (since it precludes authenticity checking on the packages), I don't think it represents a great security issue for this purpose, since I'm loading it on a test machine.

Now, you can install PG 8 by issuing the yum command:

yum –enablerepo=testing install  ...

Useful Configuration

A Windows installation will be usable almost immediately. To make the Linux version fully functional, you need to do just a couple of things.

First, initial your DB cluster by starting the DBMS server. Most distributions can do this with the command /etc/init.d/postgresql start. (If you're using Red Hat Enterprise, replace "postgresql" with "rhdb."

Next, I assume that you'll want to connect to PG via TCP/IP (either ODBC or JDBC), so you'll need to enable that. By default, PG only allows connections from the local host, and then only on a UNIX socket.

To configure PG 7.x and before to TCP/IP access, open the file /var/lib/pgsql/data/postgresql.conf and find this line:

#tcpip_socket = false  

Change it to this:

tcpip_socket = true

The configuration file for PG 8 has a slightly different configuration for this purpose. It's the same file, but instead of tcpip_socket, you'll need to add this:

port = 5432 (enables TCP/IP from the local machine)

listen_addresses = '*' (for enable connections from all interfaces)

Now, we need to configure access to the database. It's actually quite simple. The file /var/lib/pgsql/data/pg_hba.conf is a simple text file with lines that contain the following fields:

  • TYPE is "local" or "host" (where you want to connect from).
  • DATABASE is the database name or ALL.
  • USER is the postgreSQL user name.
  • IP-ADDRESS is the host address or network address from which to allow connections.
  • IP-MASK is the network mask.
  • METHOD is the authentication method.

DB authentication is a topic unto itself and beyond the scope of this article. You can check out the documentation for further information. For testing purpose, add this line:

host    all     all     127.0.0.1          255.255.255.255              trust

You'll be able to access any database as any user from your own machine. This is certainly not recommended for a production system, so I encourage you to do the research to fully understand PostgreSQL's authentication system, which is extremely flexible and powerful.

To ensure that all of your configuration changes are activated, restart PG:

/etc/init.d/postgresql stop
/etc/init.d/postgresql start

The final step you'll need to do is create a user and database for your use. As root, do the following steps (in bold):

# su – postgres
$ createuser youruserid
$ createdb -o youruserid youruserid
$ exit

Now, logged on as yourself, issue the command "psql" and you'll be in the command-line tool for accessing PostgreSQL, which is similar to the STRSQL command on the i5. If you want the GUI, you can download and install the Linux version of pgAdmin from the project's Web site.

Addressed Soon

On a side-by-side comparison chart, PG compares favorably with any major commercial DBMS. The only major difference is with the management tools. The strictly open source PG doesn't have the same high-quality management tools that its commercial brethren do. The commercialized versions of PostgreSQL, however, do have tools that are starting to approach them. Since PG is open source, a number of companies have adopted and then extended the base package from PostgreSQL.org. You can find out about these on the PG Web site. Even Red Hat ships a customized version of PG, called rhdb. So you know that whatever the tools lack today will surely be addressed soon.

Take a Serious Look

I continue to be amazed by the high-quality software that is turned out by the open-source community. Any enterprise needs a high-quality DBMS, and PostgreSQL surely fits the bill. While my true allegiance will always be to my beloved DB2, there are times when it makes no sense to deploy it.

I truly believe that you should be taking a serious look at PostgreSQL. Its competition sure is! That PG is now a serious contender is evidenced by the willingness of the commercial DBMS vendors to release the no-charge versions for development or small deployments. Can you think of any other reason for them to do it?

Have fun with PG, and I'll see you next month!

Barry L. Kline is a consultant and has been developing software on various DEC and IBM midrange platforms for over 23 years. Barry discovered Linux back in the days when it was necessary to download diskette images and source code from the Internet. Since then, he has installed Linux on hundreds of machines, where it functions as servers and workstations in iSeries and Windows networks. He co-authored the book Understanding Linux Web Hosting with Don Denoncourt. Barry can be reached at This email address is being protected from spambots. You need JavaScript enabled to view it..

BLOG COMMENTS POWERED BY DISQUS