Sun, Mar
0 New Articles

Data Access, Database Engines, and Persistence

  • Smaller Small Medium Big Bigger
  • Default Helvetica Segoe Georgia Times
Remember when the biggest problems of database design were how to convert internally described files to externally described files and how to pack dates into the smallest possible space? Yeah, that last one bit us, but back in the '70s and '80s, who knew? We didn't worry about Y2K; we worried about whether to key our physical files or whether to use OPNQRYF instead of logical files. Life was much simpler then.

A lot of that was due to the fact that the AS/400 was a tightly knit platform, and all of its languages accessed the same database (which we now know as DB2/400) via the same sorts of operations: RPG opcodes like SETLL, READE, and CHAIN or their COBOL counterparts. But as the AS/400 has transmogrified to the iSeries and now the i5, it has become ever more open and more supportive of other standards, including other languages and other database access methodologies.

In this article, I'm going to compare the various database access techniques, including ODBC and its Java cousin, JDBC (including a little bit about some of the major JDBC vendors). Next, I'll delve into the Java world more deeply and give you a look at some of the acronyms currently being touted as the "next great thing" in persistence. And finally, I'll introduce you to a few other "nontraditional" persistence packages (although as recent a participant as Java is in the persistence game, I find it a little questionable to label things as traditional and nontraditional).

As Usual, a Little History

You know me. I can't pass up an opportunity to put things into historical perspective. This subject is no different. A long time ago I bought a TI-99/4A for my son. It was a pretty barebones machine; it didn't even have a tape drive. In order to play a game, he had to type in the entire program in hexadecimal. It was pretty cool the first time he did it, but when he shut the computer off, that meant that the next time he turned it on, he had to re-key the entire program.

In order for a business computer to perform any meaningful purpose, the information you enter must be accessible in the future, even if you shut off the computer. The TI's answer to that was a cassette tape drive. However, tape drives of any form had an inherent problem in that you could only read data off the tape sequentially; finding a specific block of data on a tape meant physically reeling or unreeling the tape until you got to the right point. Remember all those awful sci-fi pictures of the '60s and '70s that showed the giant cabinets full of whirling tape drives stuttering back and forth? That behavior was due to the tape drive trying to find blocks of data, even when they were fairly close together on the tape.

And thus the disk drive was born. First came floppy disks, or diskettes (for the record, there were non-Winchester "hard disk" drives in the '50s and '60s, but those technologies were quite expensive and have long since gone the way of Betamax and 45 RPM records). For the TI, you could get a single-sided, single-density disk controller that supported diskettes with up to 90 KB of storage. Next came a revolutionary concept from IBM, the Winchester hard disk drive. The first one I ever saw was an 8-inch Winchester that was the size of a DeskJet printer and held a whopping 5 MB of data and sold for about $2,000.

Well, we know what happened next: mass production, innovation, manufacturing technology. All of these combined to move the price point of disk drives from $400/MB to today's cost of less than a dollar per gigabyte--a 400,000-fold decrease in price, even before inflation. And with more data came more choices.

Data Access

To start, we had relatively simple data access. Typically, data was stored in fixed-length files that we sorted into the appropriate sequence and then processed. For example, we might sort data by customer and then by item in order to print a report showing what items were purchased by a particular customer. On the other hand, we might sort by item and then by customer to print a report showing which customers purchased specific items. In either case, the data was physically sorted, which meant you needed not only the space for the original file, but also enough space on the disk for a temporary copy of the file containing the sorted records.

For a while, we used network and hierarchical databases, in which records pointed to one another using specific record numbers. While these designs provided excellent performance, they were complex, fragile, difficult to maintain, and prone to corruption. Losing one link in a hierarchical database could result in a serious loss of database integrity and a lengthy rebuild process.

Key Files (Native DB2/400)

Later, someone realized it might be quicker to create "key files." A key file (which in OS/400 terms is a "logical" file) was a list of the key values for a given record and its record number in the actual file (or as we call it, the "physical" file). This way, when a record was added, you simply updated the key file. You could keep multiple "views" of the same physical file without having to sort it over and over. I won't get into a deep discussion about this, but in general if you wanted to access a file in a given sequence, you did it by reading the appropriate key file and then getting the record identified by the key file. While the names have changed to protect the innocent, including terms like ISAM and VSAM, the concept of accessing data through a separately maintained key file is still the most powerful technique known to programming.

Examples of databases using key files include everything from mainframe systems (VSAM, IMS) to midrange (DB2/400) to PCs (dBase, Clipper, Paradox, Btrieve). And as we'll see in a moment, in order to provide reasonable performance, even so-called relational databases in reality rely on key files.

Relational Models

But what about SQL? Why is SQL so prevalent? Well, for one thing, it's relatively easy to use, especially for simple queries. In fact, that's what the name stands for: Structured Query Language. Second, SQL is the de facto standard for accessing relational databases (also know as relational database management systems, or RDBMSs).

"Whoa, Joe! You mean SQL isn't the only language for accessing an RDBMS?"

"That is correct, Grasshopper! Relational databases are defined by the 12 rules of Dr. E.F. Codd. And in his 12 rules, he simply states that there must be a common language for all operations, from creating databases to updating data. He did not specify what that language is, although SQL has been the standard ever since it was spawned from the original RDBMS language, IBM's SEQUEL."

So while SQL is certainly a standard, that's not because it was the language created by the designers of the RDBMS concept or because it was the chosen language for the first true RDBMS. In fact, there is no language for "true" RDBMSs; a little known fact is that there are no true relational databases, at least none that support Dr. Codd's 12 rules.

Not only that, but as I stated in the previous section, most RDBMSs make use of key files in order to perform adequately. If you are regularly using a given index over a file, SQL will tell you to create an index in order to boost performance. What is an index? It's a logical file without any record selection criteria. To select specific records, record selection parameters are used to create views.

Comparing SQL and DDS

In order to get the functionality of a logical file in DDS, you must combine a view and an index. For ad hoc inquiries, SQL does have benefits, much like those we have enjoyed in the past using OPNQRYF. On the other hand, SQL has more flexibility, especially when joining multiple tables. In general, I would say SQL has more benefits in situations where you would normally use OPNQRYF but less benefits where you would use a regular logical file. Logical files also have the added benefit of being able to seek on them, positioning your cursor anywhere within the file by key; this lack in SQL is almost impossible to overcome in some instances.

Object Databases

In order to round out the list, I need to include object databases (object database management systems, or ODBMSs). In general, the two distinctions between ODBMSs and RDBMSs are that ODBMSs tend to support more complex data, especially when it comes to BLOBs, and that the objects are easier to use in the data retrieval process; you can actually call methods in the objects themselves.

While not prevalent today, the ever-growing popularity of object-oriented languages is likely to make OO databases the Holy Grail of database design, at least for certain applications. And while I think we can assume that the language will bear a strong resemblance to SQL, I think you should realize that when it comes to advanced database access, SQL may not be all you need in the long run.


Here I have a real hot button. I'll try to be as impartial as possible, but this idea simply makes me crazy. ODBC, or Open Database Connectivity, is a standard originally published by a consortium led by Microsoft back in the early 1990s. The SQL Access Group (SAG) begat the X/Open Company, which in turn merged into the Open Group. SAG produced the SQL/CLI standard, upon which ODBC is based. SQL/CLI is really one of five standards of the ANSI/ISO SQL standards committee. So while we often think of ODBC as a Microsoft standard, it is really an ANSI/ISO standard that defines how any ODBC-compliant application can communicate with any ODBC-compliant database.

And I have no problem with a standard linkage that allows a program to access a database. Personally, I think RPG's use of the externally described file is the best way to do it, because defining such things at compile time leads inevitably to performance advantages. But I can see situations where it's nice to be able to simply send a structured statement to the database engine and have it return some data. The problem to me is when this standard is used in place of distributed processing, which is what has happened. Since programmers can use ODBC in place of having to actually define their own message protocol, what happens is that client applications send ODBC requests to database engines on servers. The problem here is in the binding of table and column names between the client and the server.

I find it especially ironic that people who support ODBC on the one hand at the same time purport that one of the distinct advantages of SQL is that you don't have to change your applications when your database changes. This is perhaps true when you have to add a field, but it's certainly not true when you have to change a field type, rename a field, delete a field, move a field from one file to another, or change how a field works. Those changes require changes to the original application. This dependence on the database layout is multiplied when ODBC is used as a transaction distribution agent, because now, rather than simply having to change the applications on the host, you have to change the client code and then distribute it to all the clients. It is this cost that I think is particularly difficult to justify in order to save a few bucks on creating your own messaging mechanism.

And since JDBC is simply a Java layer over ODBC, the same issues apply. If the Java application is running on the same machine as the database, I have no real problems with it, but if the program is running on a client machine, then you have bound the client and server--to the detriment of your long-term architectural growth.

Wait, Wait, Back Up a Moment

Is this meant to imply that there is another way? Absolutely. The communication between client and server should be at the business level, not the database level. Messages should be passed between the two boxes whose relationship with the physical data on disk is coincidental at best. How these messages are formatted is secondary to the fact that the database can change completely without affecting the application at all. In fact, the message itself can change without the application being affected by simply including a version number in the message. Older messages can be supported for as long as needed and can even indicate to the application that they are getting "stale" (for example, a warning message might pop up saying that the user needs to update the application within the next 30 days). This handles the problem of "all at once" distribution of client changes.


Take a good look at the Java standards and you'll see that, outside of the actual definition of the language itself (which is quite good), the rest of the standards lack much substance. It's as if the Java committee decided that they would do only 80% of any given standard, leaving the hard part to the vendors. Not only that, but many portions of these standards are "optional," meaning the standards are really little more than recommendations. Part of this may have been pragmatic in order to keep the interest in Java high by allowing companies to avoid certain things, while at the same time allowing them to incorporate technologies that might be less than perfect, but the result is that the Java committees haven't produced anything useful for application developers for some time now.

Maybe this is backlash from the horrible experience of Enterprise Java Beans (EJBs). This standard was so over-engineered yet so poorly implemented that it can't be used by any but the most seriously Java-centric of shops. Yet it was acclaimed by many an early adopter of Java as a business application development language. When it became clear that EJB was a failure, Java Data Objects (JDO) became the next great standard. JDO was an attempt at a trimmed-down version of EJB, but once again the theoretical standard was never really run through the basic practical design requirement of determining how it would help existing developers. With no practical way to retrofit existing databases into the JDO methodology, JDO became one more Java-only exercise, all but useless for the real-world goal of developing applications in multiple languages and combining legacy code with new build.

Service Data Objects (SDO) is one more entry that just misses the point. The basic concept is absolutely phenomenal: Objects are generated from the database and can be either strongly or loosely typed. You can create queries with very little specific information about the target database by returning very loosely typed data. At the same time, in a business application, your information can be as strongly typed as you need, allowing you to perform required business functions. All that is needed is the detailed information about the object itself. Brilliant! The problem with the SDO standard is that the source of that information is the JDBC metadata, which means you're once again tied to the rows and columns of the database. ARGH!

I'll need some more time to analyze this particular buzzword to see if there's any meat behind it. It would be nice if the SDO folks were smart enough to create an abstract object information class that didn't know what JDBC was and then provide an adapter to convert JDBC metadata to object information. Then you could supply other adapters, such as for XML documents. Then, two applications could sync up with one another through a quick exchange of XML and be up and running.

Persistence Engines

I promised an introduction to persistence engines. I won't bore you with a ton of details, but instead just an overview. Like so many things in Java, the developers come upon something that we've been doing for decades, but it's new to them, so they rechristen it. What we call "updating the database" Java developers call "persistence." And many of the things that we've taken into account over the years are such radical concepts that the Java folks tend to write them into their database engines manually.

In persistence engines, a good example of this is write caching. While OS/400 has evolved to the point where write caching is handled by the operating system in such a way that we don't even think of it, performance on individual write operations using JDBC is so poor that the Java folks have to come up with their own write caching. This is one of the things that makes the Hibernate database so fast. Unfortunately, one of the problems with this approach is that since the caching is entirely in the Hibernate software, if the Java Virtual Machine (JVM) gets canceled, you stand an unacceptably high chance of losing data.

Other methods have their own issues. You can find a number of persistence engines here. You'll see that they fall into a couple of basic types. The older engines simply provide O/R (object-to-relational) mapping, often XML-driven. Another significant segment of the solutions are JDO implementations. One particularly interesting one I found was something called JDBCPersistence, which claims to be able to create the required objects on the fly, generating the appropriate Java bytecode. That's something I'd like to see!

Next Steps

Now that you've looked into the world of persistence, especially as seen from the Java perspective, it's time for you to review your own data access needs. I highly recommend that you create some layer that exists between your application objects (whether they're Java objects or arrays of data structures in RPG) and your database. This layer can be used either in local applications on your host or in client applications. By doing this, you'll be able to better handle changes to your database.

Joe Pluta is the founder and chief architect of Pluta Brothers Design, Inc. He has been working in the field since the late 1970s and has made a career of extending the IBM midrange, starting back in the days of the IBM System/3. Joe has used WebSphere extensively, especially as the base for PSC/400, the only product that can move your legacy systems to the Web using simple green-screen commands. Joe is also the author of E-Deployment: The Fastest Path to the Web, Eclipse: Step by Step, and WDSC: Step by Step. You can reach him at This email address is being protected from spambots. You need JavaScript enabled to view it..

Joe Pluta

Joe Pluta is the founder and chief architect of Pluta Brothers Design, Inc. He has been extending the IBM midrange since the days of the IBM System/3. Joe uses WebSphere extensively, especially as the base for PSC/400, the only product that can move your legacy systems to the Web using simple green-screen commands. He has written several books, including Developing Web 2.0 Applications with EGL for IBM i, E-Deployment: The Fastest Path to the Web, Eclipse: Step by Step, and WDSC: Step by Step. Joe performs onsite mentoring and speaks at user groups around the country. You can reach him at This email address is being protected from spambots. You need JavaScript enabled to view it..

MC Press books written by Joe Pluta available now on the MC Press Bookstore.

Developing Web 2.0 Applications with EGL for IBM i Developing Web 2.0 Applications with EGL for IBM i
Joe Pluta introduces you to EGL Rich UI and IBM’s Rational Developer for the IBM i platform.
List Price $39.95

Now On Sale

WDSC: Step by Step WDSC: Step by Step
Discover incredibly powerful WDSC with this easy-to-understand yet thorough introduction.
List Price $74.95

Now On Sale

Eclipse: Step by Step Eclipse: Step by Step
Quickly get up to speed and productivity using Eclipse.
List Price $59.00

Now On Sale



Support MC Press Online

$0.00 Raised:

Book Reviews

Resource Center

  • SB Profound WC 5536 Have you been wondering about Node.js? Our free Node.js Webinar Series takes you from total beginner to creating a fully-functional IBM i Node.js business application. You can find Part 1 here. In Part 2 of our free Node.js Webinar Series, Brian May teaches you the different tooling options available for writing code, debugging, and using Git for version control. Brian will briefly discuss the different tools available, and demonstrate his preferred setup for Node development on IBM i or any platform. Attend this webinar to learn:

  • SB Profound WP 5539More than ever, there is a demand for IT to deliver innovation. Your IBM i has been an essential part of your business operations for years. However, your organization may struggle to maintain the current system and implement new projects. The thousands of customers we've worked with and surveyed state that expectations regarding the digital footprint and vision of the company are not aligned with the current IT environment.

  • SB HelpSystems ROBOT Generic IBM announced the E1080 servers using the latest Power10 processor in September 2021. The most powerful processor from IBM to date, Power10 is designed to handle the demands of doing business in today’s high-tech atmosphere, including running cloud applications, supporting big data, and managing AI workloads. But what does Power10 mean for your data center? In this recorded webinar, IBMers Dan Sundt and Dylan Boday join IBM Power Champion Tom Huntington for a discussion on why Power10 technology is the right strategic investment if you run IBM i, AIX, or Linux. In this action-packed hour, Tom will share trends from the IBM i and AIX user communities while Dan and Dylan dive into the tech specs for key hardware, including:

  • Magic MarkTRY the one package that solves all your document design and printing challenges on all your platforms. Produce bar code labels, electronic forms, ad hoc reports, and RFID tags – without programming! MarkMagic is the only document design and print solution that combines report writing, WYSIWYG label and forms design, and conditional printing in one integrated product. Make sure your data survives when catastrophe hits. Request your trial now!  Request Now.

  • SB HelpSystems ROBOT GenericForms of ransomware has been around for over 30 years, and with more and more organizations suffering attacks each year, it continues to endure. What has made ransomware such a durable threat and what is the best way to combat it? In order to prevent ransomware, organizations must first understand how it works.

  • SB HelpSystems ROBOT GenericIT security is a top priority for businesses around the world, but most IBM i pros don’t know where to begin—and most cybersecurity experts don’t know IBM i. In this session, Robin Tatam explores the business impact of lax IBM i security, the top vulnerabilities putting IBM i at risk, and the steps you can take to protect your organization. If you’re looking to avoid unexpected downtime or corrupted data, you don’t want to miss this session.

  • SB HelpSystems ROBOT GenericCan you trust all of your users all of the time? A typical end user receives 16 malicious emails each month, but only 17 percent of these phishing campaigns are reported to IT. Once an attack is underway, most organizations won’t discover the breach until six months later. A staggering amount of damage can occur in that time. Despite these risks, 93 percent of organizations are leaving their IBM i systems vulnerable to cybercrime. In this on-demand webinar, IBM i security experts Robin Tatam and Sandi Moore will reveal:

  • FORTRA Disaster protection is vital to every business. Yet, it often consists of patched together procedures that are prone to error. From automatic backups to data encryption to media management, Robot automates the routine (yet often complex) tasks of iSeries backup and recovery, saving you time and money and making the process safer and more reliable. Automate your backups with the Robot Backup and Recovery Solution. Key features include:

  • FORTRAManaging messages on your IBM i can be more than a full-time job if you have to do it manually. Messages need a response and resources must be monitored—often over multiple systems and across platforms. How can you be sure you won’t miss important system events? Automate your message center with the Robot Message Management Solution. Key features include:

  • FORTRAThe thought of printing, distributing, and storing iSeries reports manually may reduce you to tears. Paper and labor costs associated with report generation can spiral out of control. Mountains of paper threaten to swamp your files. Robot automates report bursting, distribution, bundling, and archiving, and offers secure, selective online report viewing. Manage your reports with the Robot Report Management Solution. Key features include:

  • FORTRAFor over 30 years, Robot has been a leader in systems management for IBM i. With batch job creation and scheduling at its core, the Robot Job Scheduling Solution reduces the opportunity for human error and helps you maintain service levels, automating even the biggest, most complex runbooks. Manage your job schedule with the Robot Job Scheduling Solution. Key features include:

  • LANSA Business users want new applications now. Market and regulatory pressures require faster application updates and delivery into production. Your IBM i developers may be approaching retirement, and you see no sure way to fill their positions with experienced developers. In addition, you may be caught between maintaining your existing applications and the uncertainty of moving to something new.

  • LANSAWhen it comes to creating your business applications, there are hundreds of coding platforms and programming languages to choose from. These options range from very complex traditional programming languages to Low-Code platforms where sometimes no traditional coding experience is needed. Download our whitepaper, The Power of Writing Code in a Low-Code Solution, and:

  • LANSASupply Chain is becoming increasingly complex and unpredictable. From raw materials for manufacturing to food supply chains, the journey from source to production to delivery to consumers is marred with inefficiencies, manual processes, shortages, recalls, counterfeits, and scandals. In this webinar, we discuss how:

  • The MC Resource Centers bring you the widest selection of white papers, trial software, and on-demand webcasts for you to choose from. >> Review the list of White Papers, Trial Software or On-Demand Webcast at the MC Press Resource Center. >> Add the items to yru Cart and complet he checkout process and submit

  • Profound Logic Have you been wondering about Node.js? Our free Node.js Webinar Series takes you from total beginner to creating a fully-functional IBM i Node.js business application.

  • SB Profound WC 5536Join us for this hour-long webcast that will explore:

  • Fortra IT managers hoping to find new IBM i talent are discovering that the pool of experienced RPG programmers and operators or administrators with intimate knowledge of the operating system and the applications that run on it is small. This begs the question: How will you manage the platform that supports such a big part of your business? This guide offers strategies and software suggestions to help you plan IT staffing and resources and smooth the transition after your AS/400 talent retires. Read on to learn: