A lot of that was due to the fact that the AS/400 was a tightly knit platform, and all of its languages accessed the same database (which we now know as DB2/400) via the same sorts of operations: RPG opcodes like SETLL, READE, and CHAIN or their COBOL counterparts. But as the AS/400 has transmogrified to the iSeries and now the i5, it has become ever more open and more supportive of other standards, including other languages and other database access methodologies.
In this article, I'm going to compare the various database access techniques, including ODBC and its Java cousin, JDBC (including a little bit about some of the major JDBC vendors). Next, I'll delve into the Java world more deeply and give you a look at some of the acronyms currently being touted as the "next great thing" in persistence. And finally, I'll introduce you to a few other "nontraditional" persistence packages (although as recent a participant as Java is in the persistence game, I find it a little questionable to label things as traditional and nontraditional).
As Usual, a Little History
You know me. I can't pass up an opportunity to put things into historical perspective. This subject is no different. A long time ago I bought a TI-99/4A for my son. It was a pretty barebones machine; it didn't even have a tape drive. In order to play a game, he had to type in the entire program in hexadecimal. It was pretty cool the first time he did it, but when he shut the computer off, that meant that the next time he turned it on, he had to re-key the entire program.
In order for a business computer to perform any meaningful purpose, the information you enter must be accessible in the future, even if you shut off the computer. The TI's answer to that was a cassette tape drive. However, tape drives of any form had an inherent problem in that you could only read data off the tape sequentially; finding a specific block of data on a tape meant physically reeling or unreeling the tape until you got to the right point. Remember all those awful sci-fi pictures of the '60s and '70s that showed the giant cabinets full of whirling tape drives stuttering back and forth? That behavior was due to the tape drive trying to find blocks of data, even when they were fairly close together on the tape.
And thus the disk drive was born. First came floppy disks, or diskettes (for the record, there were non-Winchester "hard disk" drives in the '50s and '60s, but those technologies were quite expensive and have long since gone the way of Betamax and 45 RPM records). For the TI, you could get a single-sided, single-density disk controller that supported diskettes with up to 90 KB of storage. Next came a revolutionary concept from IBM, the Winchester hard disk drive. The first one I ever saw was an 8-inch Winchester that was the size of a DeskJet printer and held a whopping 5 MB of data and sold for about $2,000.
Well, we know what happened next: mass production, innovation, manufacturing technology. All of these combined to move the price point of disk drives from $400/MB to today's cost of less than a dollar per gigabyte--a 400,000-fold decrease in price, even before inflation. And with more data came more choices.
To start, we had relatively simple data access. Typically, data was stored in fixed-length files that we sorted into the appropriate sequence and then processed. For example, we might sort data by customer and then by item in order to print a report showing what items were purchased by a particular customer. On the other hand, we might sort by item and then by customer to print a report showing which customers purchased specific items. In either case, the data was physically sorted, which meant you needed not only the space for the original file, but also enough space on the disk for a temporary copy of the file containing the sorted records.
For a while, we used network and hierarchical databases, in which records pointed to one another using specific record numbers. While these designs provided excellent performance, they were complex, fragile, difficult to maintain, and prone to corruption. Losing one link in a hierarchical database could result in a serious loss of database integrity and a lengthy rebuild process.
Key Files (Native DB2/400)
Later, someone realized it might be quicker to create "key files." A key file (which in OS/400 terms is a "logical" file) was a list of the key values for a given record and its record number in the actual file (or as we call it, the "physical" file). This way, when a record was added, you simply updated the key file. You could keep multiple "views" of the same physical file without having to sort it over and over. I won't get into a deep discussion about this, but in general if you wanted to access a file in a given sequence, you did it by reading the appropriate key file and then getting the record identified by the key file. While the names have changed to protect the innocent, including terms like ISAM and VSAM, the concept of accessing data through a separately maintained key file is still the most powerful technique known to programming.
Examples of databases using key files include everything from mainframe systems (VSAM, IMS) to midrange (DB2/400) to PCs (dBase, Clipper, Paradox, Btrieve). And as we'll see in a moment, in order to provide reasonable performance, even so-called relational databases in reality rely on key files.
But what about SQL? Why is SQL so prevalent? Well, for one thing, it's relatively easy to use, especially for simple queries. In fact, that's what the name stands for: Structured Query Language. Second, SQL is the de facto standard for accessing relational databases (also know as relational database management systems, or RDBMSs).
"Whoa, Joe! You mean SQL isn't the only language for accessing an RDBMS?"
"That is correct, Grasshopper! Relational databases are defined by the 12 rules of Dr. E.F. Codd. And in his 12 rules, he simply states that there must be a common language for all operations, from creating databases to updating data. He did not specify what that language is, although SQL has been the standard ever since it was spawned from the original RDBMS language, IBM's SEQUEL."
So while SQL is certainly a standard, that's not because it was the language created by the designers of the RDBMS concept or because it was the chosen language for the first true RDBMS. In fact, there is no language for "true" RDBMSs; a little known fact is that there are no true relational databases, at least none that support Dr. Codd's 12 rules.
Not only that, but as I stated in the previous section, most RDBMSs make use of key files in order to perform adequately. If you are regularly using a given index over a file, SQL will tell you to create an index in order to boost performance. What is an index? It's a logical file without any record selection criteria. To select specific records, record selection parameters are used to create views.
Comparing SQL and DDS
In order to get the functionality of a logical file in DDS, you must combine a view and an index. For ad hoc inquiries, SQL does have benefits, much like those we have enjoyed in the past using OPNQRYF. On the other hand, SQL has more flexibility, especially when joining multiple tables. In general, I would say SQL has more benefits in situations where you would normally use OPNQRYF but less benefits where you would use a regular logical file. Logical files also have the added benefit of being able to seek on them, positioning your cursor anywhere within the file by key; this lack in SQL is almost impossible to overcome in some instances.
In order to round out the list, I need to include object databases (object database management systems, or ODBMSs). In general, the two distinctions between ODBMSs and RDBMSs are that ODBMSs tend to support more complex data, especially when it comes to BLOBs, and that the objects are easier to use in the data retrieval process; you can actually call methods in the objects themselves.
While not prevalent today, the ever-growing popularity of object-oriented languages is likely to make OO databases the Holy Grail of database design, at least for certain applications. And while I think we can assume that the language will bear a strong resemblance to SQL, I think you should realize that when it comes to advanced database access, SQL may not be all you need in the long run.
ODBC and JDBC
Here I have a real hot button. I'll try to be as impartial as possible, but this idea simply makes me crazy. ODBC, or Open Database Connectivity, is a standard originally published by a consortium led by Microsoft back in the early 1990s. The SQL Access Group (SAG) begat the X/Open Company, which in turn merged into the Open Group. SAG produced the SQL/CLI standard, upon which ODBC is based. SQL/CLI is really one of five standards of the ANSI/ISO SQL standards committee. So while we often think of ODBC as a Microsoft standard, it is really an ANSI/ISO standard that defines how any ODBC-compliant application can communicate with any ODBC-compliant database.
And I have no problem with a standard linkage that allows a program to access a database. Personally, I think RPG's use of the externally described file is the best way to do it, because defining such things at compile time leads inevitably to performance advantages. But I can see situations where it's nice to be able to simply send a structured statement to the database engine and have it return some data. The problem to me is when this standard is used in place of distributed processing, which is what has happened. Since programmers can use ODBC in place of having to actually define their own message protocol, what happens is that client applications send ODBC requests to database engines on servers. The problem here is in the binding of table and column names between the client and the server.
I find it especially ironic that people who support ODBC on the one hand at the same time purport that one of the distinct advantages of SQL is that you don't have to change your applications when your database changes. This is perhaps true when you have to add a field, but it's certainly not true when you have to change a field type, rename a field, delete a field, move a field from one file to another, or change how a field works. Those changes require changes to the original application. This dependence on the database layout is multiplied when ODBC is used as a transaction distribution agent, because now, rather than simply having to change the applications on the host, you have to change the client code and then distribute it to all the clients. It is this cost that I think is particularly difficult to justify in order to save a few bucks on creating your own messaging mechanism.
And since JDBC is simply a Java layer over ODBC, the same issues apply. If the Java application is running on the same machine as the database, I have no real problems with it, but if the program is running on a client machine, then you have bound the client and server--to the detriment of your long-term architectural growth.
Wait, Wait, Back Up a Moment
Is this meant to imply that there is another way? Absolutely. The communication between client and server should be at the business level, not the database level. Messages should be passed between the two boxes whose relationship with the physical data on disk is coincidental at best. How these messages are formatted is secondary to the fact that the database can change completely without affecting the application at all. In fact, the message itself can change without the application being affected by simply including a version number in the message. Older messages can be supported for as long as needed and can even indicate to the application that they are getting "stale" (for example, a warning message might pop up saying that the user needs to update the application within the next 30 days). This handles the problem of "all at once" distribution of client changes.
EJB, JDO, and SDO
Take a good look at the Java standards and you'll see that, outside of the actual definition of the language itself (which is quite good), the rest of the standards lack much substance. It's as if the Java committee decided that they would do only 80% of any given standard, leaving the hard part to the vendors. Not only that, but many portions of these standards are "optional," meaning the standards are really little more than recommendations. Part of this may have been pragmatic in order to keep the interest in Java high by allowing companies to avoid certain things, while at the same time allowing them to incorporate technologies that might be less than perfect, but the result is that the Java committees haven't produced anything useful for application developers for some time now.
Maybe this is backlash from the horrible experience of Enterprise Java Beans (EJBs). This standard was so over-engineered yet so poorly implemented that it can't be used by any but the most seriously Java-centric of shops. Yet it was acclaimed by many an early adopter of Java as a business application development language. When it became clear that EJB was a failure, Java Data Objects (JDO) became the next great standard. JDO was an attempt at a trimmed-down version of EJB, but once again the theoretical standard was never really run through the basic practical design requirement of determining how it would help existing developers. With no practical way to retrofit existing databases into the JDO methodology, JDO became one more Java-only exercise, all but useless for the real-world goal of developing applications in multiple languages and combining legacy code with new build.
Service Data Objects (SDO) is one more entry that just misses the point. The basic concept is absolutely phenomenal: Objects are generated from the database and can be either strongly or loosely typed. You can create queries with very little specific information about the target database by returning very loosely typed data. At the same time, in a business application, your information can be as strongly typed as you need, allowing you to perform required business functions. All that is needed is the detailed information about the object itself. Brilliant! The problem with the SDO standard is that the source of that information is the JDBC metadata, which means you're once again tied to the rows and columns of the database. ARGH!
I'll need some more time to analyze this particular buzzword to see if there's any meat behind it. It would be nice if the SDO folks were smart enough to create an abstract object information class that didn't know what JDBC was and then provide an adapter to convert JDBC metadata to object information. Then you could supply other adapters, such as for XML documents. Then, two applications could sync up with one another through a quick exchange of XML and be up and running.
I promised an introduction to persistence engines. I won't bore you with a ton of details, but instead just an overview. Like so many things in Java, the developers come upon something that we've been doing for decades, but it's new to them, so they rechristen it. What we call "updating the database" Java developers call "persistence." And many of the things that we've taken into account over the years are such radical concepts that the Java folks tend to write them into their database engines manually.
In persistence engines, a good example of this is write caching. While OS/400 has evolved to the point where write caching is handled by the operating system in such a way that we don't even think of it, performance on individual write operations using JDBC is so poor that the Java folks have to come up with their own write caching. This is one of the things that makes the Hibernate database so fast. Unfortunately, one of the problems with this approach is that since the caching is entirely in the Hibernate software, if the Java Virtual Machine (JVM) gets canceled, you stand an unacceptably high chance of losing data.
Other methods have their own issues. You can find a number of persistence engines here. You'll see that they fall into a couple of basic types. The older engines simply provide O/R (object-to-relational) mapping, often XML-driven. Another significant segment of the solutions are JDO implementations. One particularly interesting one I found was something called JDBCPersistence, which claims to be able to create the required objects on the fly, generating the appropriate Java bytecode. That's something I'd like to see!
Now that you've looked into the world of persistence, especially as seen from the Java perspective, it's time for you to review your own data access needs. I highly recommend that you create some layer that exists between your application objects (whether they're Java objects or arrays of data structures in RPG) and your database. This layer can be used either in local applications on your host or in client applications. By doing this, you'll be able to better handle changes to your database.
Joe Pluta is the founder and chief architect of Pluta Brothers Design, Inc. He has been working in the field since the late 1970s and has made a career of extending the IBM midrange, starting back in the days of the IBM System/3. Joe has used WebSphere extensively, especially as the base for PSC/400, the only product that can move your legacy systems to the Web using simple green-screen commands. Joe is also the author of E-Deployment: The Fastest Path to the Web, Eclipse: Step by Step, and WDSC: Step by Step. You can reach him at