The Domino DB2 Debate: Why No System i DB2 Domino Data Store?

Commentary
Typography
  • Smaller Small Medium Big Bigger
  • Default Helvetica Segoe Georgia Times

Normally, I wouldn't comment on this kind of reporting because every analyst has his or her perspective on what's important in a software release. But in this case, I feel I must weigh in because so many of the underlying facts within the reporter's story are so simplified that they distort the issue. In my view, the whole uproar does a great disservice to the architectures of both Domino and System i.

Notes/Domino DB2 Integration

Starting with Notes/Domino Release 7, now two years old, Lotus permitted customers to build applications in Lotus Notes that could use the DB2 UDB data store instead of the native NSF document database format. This option was in addition to the well-documented and implemented APIs, connectors, and services that had previously existed within the Notes/Domino realm.

Analysts have hypothesized that Lotus was instructed to create this additional function within the Notes/Domino product line to satisfy a competitive marketing point against Oracle and Microsoft, and they have questioned the need for that added functionality.

Nonetheless, starting with Release 7 of Notes/Domino, DB2 UDB data store functionality was implemented. Yet there were stringent requirements that Lotus enacted to permit this new data store format for Notes/Domino. For instance, the server had to be dedicated to Domino and have no more than one DB2 instance and one DB2 database for the Domino server. Moreover, you needed a special product key from Lotus to unleash this functionality.

A Domino System i DB2 Data Store?

No sooner had IBM Lotus made this announcement than System i customers speculated that Lotus would be bringing the same functionality to the System i's version of Notes/Domino.

Alas, it was not so easy to do, and the reasons are complex and represent a lot of engineering tradeoffs that could, in my estimation, remove any real value to users, programmers, developers, or even marketers. Yet some System i customers felt that their beloved platform was being stiffed when System i DB2 functionality was not implemented in Release 8 of Notes/Domino.

Understanding the Notes/Domino NSF Database

So let's start at the beginning, looking at the database structures themselves, to try to understand the complexity and the conundrum facing Lotus and the System i.

One of the erroneous statements made by this unnamed reporter about the native Lotus Notes/Domino NSF files is that NSF is a proprietary "flat file" structure. Nothing could be further from the truth.

NSF stands for Notes Storage Facility, and it's a unique and highly robust database structure—unlike anything else in the computing world. The design was revolutionary at its inception and so advanced for its time that the only comparable structure currently in use today might be the XML schema.

Like XML, each "document" contains a complete schema of the data contained within, including the functionalities of data types, relationships to calculations, functions, macros, and APIs. Here is where the similarities end, however.

The Advanced Functionality of the NSF Database Format

The NSF database structure is so finely granulated that, were you to copy a single record from a Notes NSF file and place it into a completely new NSF file, all of the internally defined functions would transfer with it. By the same token, you can copy and paste all of the functions separately—without the data—into an empty NSF file and begin entering new data that would be managed precisely as the old. The NSF file is both backward- and forward-compatible with other releases of Domino (though newer functions supported by newer releases would obviously not work.)

NSF Portability

Portability between operating system platforms is a key design feature of NSF functionality. This portability of an NSF database file is really more akin to a spreadsheet file: You copy a spreadsheet with all its functions and formats easily from one file to another. Unfortunately, with a spreadsheet, you can't copy between operating systems. With an NSF database, there is no such limitation.

As in a spreadsheet, data entered into fields do not need to be "data typed." Unlike a spreadsheet, NSF provides rich-text fields that permit you to easily embed attachments or create richly formatted data.

Moreover, fields containing "null data" do not consume any disk storage at all, while Java and JavaScript programs, as well as links to powerful LotusScript object-oriented programming agents, can be simultaneously incorporated and contained within each NSF record.

To mistake the NSF format as a "flat-file format," as that one reporter presumed, is to have missed the essence of just one of the advantages of a Notes/Domino database: It is a highly functional, extremely portable rapid application development environment that consistently breaks the traditional barriers of user-generated code creation. A nominally versed user can develop a fully functioning database on the fly and then "evolve" the structure as new functions are needed. That user's investment in the data is then preserved, within each record of the database, for all subsequent versions of the middleware, and the data can be ported, undisturbed, between versions of Domino or between different operating system platforms.

NSF Is Not a Relational Database Structure

Nonetheless, the NSF format is not relational, but document-centric. Since the database schema is embedded in each record—and different records can contain different versions of the schema—DB2 database administrators and programmers often find NSF a confusing structure with which to work when connecting Notes/Domino to their other database services. In response to this conundrum, Lotus developed external services and APIs to make the transformation simpler.

For instance, NotesSQL is an ODBC driver that permits Microsoft users to build SQL queries against an NSF file. NotesSQL dynamically restructures the NSF data so that it appears to the Microsoft user as an Access database or as data in an Excel spreadsheet. Other services, called Lotus Connectors, extend the reach of NSF into nearly every other realm of database connectivity, including System i DB2 and DB2 UDB.

The File System Dilemma

OK, so the reporter misunderstood the nature of NSF. No big deal, right? Everyone makes a slip now and then.

However, in the same paragraph, the author then goes on to call DB2 UDB a "variant" of the System i DB2 database. And indeed, while on the surface DB2 UDB permits much of the same (and sometimes more) functionality of the System i's DB2, there are significant engineering differences in how that functionality is achieved. In order to understand those differences, we have to look at the various file systems employed by the different operating systems and the nature of DB2 itself.

On a PC, AIX, or Linux box, files are stored like nuggets in contiguous logical spaces. When an application begins to process a file, it uses the logic of the application to access, analyze, and interrogate the contents of the file. For instance, you can open a Microsoft Word file with Microsoft Excel simply by changing the .doc extension to .xls. Of course, the Excel program will choke as soon as it starts looking for the familiar .xls structure, but you can still force the file to open.

Excel will futilely attempt to do something with the file's structure but will ultimately deliver a spurious result, corrupting the file in the process. That is because the program interrogates the file for instructions by which it can understand the file's contents. (It is also the reason that viruses were once such a prevalent threat in Microsoft Office data. Similar vulnerabilities exist in AIX and Linux, though they aren't quite so prevalent.)

The same process occurs when an AIX, a Linux, or a Windows program accesses a DB2 UDB file: It opens the file and then begins interrogating the structure for clues on how to process the contents.

This is not the process by which i5/OS opens a DB2 database file on the System i.

Differences in DB2

DB2 UDB is not a "variant" of the System i DB2 structure, but a completely different code base that shares similar functions of DB2 on the System i.

When IBM Software began developing DB2 UDB for use with AIX and Windows, it was attempting to replicate the functions of DB2 from the mainframe (not the System i). At the same time, the engineers who were working on the database of the AS/400 were simultaneously bringing that machine's database (known then as "AS/400 integrated database") into closer compliance with the mainframe's functions. These two efforts of standardization brought about the rebranding of DB2 to a "standard" set of functions that IBM labeled DB2 Universal Database, or DB2 UDB. The standard was a template for functional compliance, not for a standard code base.

On the PC or on an AIX box, a DB2 UDB file is self-contained and can coexist with other database file formats within the directory services of the operating system. This is not the case for System i DB2.

Though the list of functionalities provided by DB2 UDB is similar to the functionalities provided by today's System i DB2, how those functionalities are delivered to the user program are significantly different. Why?

Single-Level Store and the System i DB2 Database

The answer can be summed up in a single statement: The System i uses integrated single-level store! Unlike other operating systems that treat disk storage as contiguous logical data streams, the System i (and its predecessors, the iSeries, AS/400, and System/38) treats all storage—both virtual and physical storage—as a single, addressable space in which data and code may be scattered throughout the combined resource of virtual and physical memory. i5/OS treats all secondary storage as a single pool of data, rather than as a collection of multiple pools (file systems), as is usually done on UNIX-like systems and Microsoft Windows. It intentionally scatters the pages of all objects across all disks so that the objects can be stored and retrieved much more rapidly.

Access to this completely virtual realm is activated through a separate set of operating system instructions called Technology Independent Machine Interface, or TIMI. The database of the System i is integrated to the TIMI, along with embedded object security. Unlike AIX or Windows, you cannot change a System i DB2 file's attributes to be opened by a non-DB2 program. Why not? Because the System i actually assembles the structure of the data upon demand out of a single-level storage map as a function of the i5/OS operating system. The operating system checks for security, corruption, and consistency prior to presenting the data to the application, enabling it to maximize storage and performance while ensuring the validity of the database object itself. Instead of accessing a file from disk—reading and writing the data as the program progresses—the operating system assembles a fresh in-memory "page" facsimile of the information from the required parts of the overall single-level store.

How Does an NSF File Exist on the System i?

Now, this raises a interesting question: If DB2 data is scattered all over the System i's address space through single-level store, how does single-level store impact an NSF database on the System i? Wouldn't NSF data likewise be scattered between pillar and post?

Indeed, when IBM Lotus first began investigating the idea of porting Notes/Domino to the System i's predecessor, the AS/400, they were faced with the dilemma of how the AS/400's operating system was going to manage the NSF database files. They didn't want to write a completely new set of TIMI instructions, and there was no way to maintain NSF portability between operating systems if they wrote a completely new variation of the NSF format.

The answer to this dilemma was a separate directory service that permitted the AS/400 to store so-called UNIX-like "stream files" on the system. This directory service is called the Integrated File System (IFS), and that is where NSF files today reside on the System i. The IFS permits nearly any kind of UNIX/Window file to be stored as a contiguous stream of data, with much looser security and consistency checks than those of System i DB2.

NSF Databases on the IFS

The IFS directory service mimics the tree structure of AIX, Linux, and Windows directory services as a kind of "file system within a file system." The IFS is one of those unique System i facilities that enables encapsulated databases, like NSF or Microsoft Access, to exist within the protective shell of the operating system while still preserving the deeper resources of single-level store and addressability.

The result of this masterful directory service is that System i DB2 data is accessed using the optimized dynamic algorithms of the i5/OS operating system, while files contained within the IFS directory are accessed through a different instruction route entirely.

In theory, a compete AIX- or Windows-style DB2 UDB database files could reside within the IFS, though it is unlikely that i5/OS would permit it to be accessed with traditional RPG, for obvious reasons.

Forest for the Trees

Were you to draw an analogy of how these two different directory services function, one might think of a deep forest of data and code.

The IFS represents a single pathway through this forest that leads to a single, contiguous file of "stream data" that a PC or UNIX application uses to obtain its data.

By comparison, an RPG program on the System i DB2 is provided a satellite view of the entire forest of data and then instructs the operating system to pick and choose all the required bits and bytes it needs to present the exact requirements of the DB2 file.

In other words, the System i's strategy working with DB2 data is global in nature, while the IFS represents a well-worn trail to the specific stream of data stored in the IFS.

Conundrum: Notes/Domino Data Inside System i DB2?

So here are the final engineering conundrums that Lotus is faced with when considering making Notes/Domino work within this dynamic teapot of System i DB2 functionality:

If Lotus were to engineer a DB2 store for Notes/Domino within the System i's DB2 structure, how would that store actually work?

Would the structure work with the dynamic assembly of a single-level store DB2? Or would it attempt to build a DB2 UDB structure within the IFS?

If it chose the latter method, would Rochester then need to identify a DB2 UDB program instruction set that's separate from the standard System i DB2 instruction set? How would that impact Notes/Domino file portability?

How would the security processes (called Access Control Lists, or ACLs, in Lotus) be reconciled within i5/OS security? Which security system would take precedence?

All of these questions become quite important as soon as Lotus moves out of its well-trodden technical path of IFS "stream data" and would certainly require very detailed work with the engineers of i5/OS.

Benefits?

And what would be the final benefit to the user? Notes/Domino can already access System i DB2 data through connectors and other APIs. The same is true of System i programs needing to access NSF data.

The benefits would be, by my estimation, very limited. But the cost in man hours of engineering—at both Lotus and IBM Rochester—could be enormous.

Engineering Lotus Domino Solutions

Last April, I had the opportunity to interview Jim Colson, IBM Lotus Chief Architect and Distinguished Engineer for Notes/Domino. At the time, I was trying to satisfy my curiosity about how Lotus could—release after release—deliver a new piece of middleware that was nearly 100 percent cross-platform compatible, while maintaining a consistent, 100 percent backward compatibility with previous releases.

For instance, there are sites today running NSF databases built with Release 4.5 of Domino, and they function just as they did the day they were built, though they may now reside on Linux or Windows or System i platforms. What, I asked, was Lotus' secret to this marvel of portable, cross-platform software architecture?

The Lotus Secret to Notes/Domino Application Portability

Colson expounded eloquently upon Lotus' experience working with virtual memory and its long history of building Notes/Domino around the concept of hardware and operating system abstraction. The Lotus goal, as I understood it, was to build middleware that offered the unique advantages of instruction set independence and the separation of software lifecycle from the underlying hardware.

Consider, for instance, that your company invests in an application each time it adds new data to a database. What is the value of a piece of code if, each time a new operating system or application version is released, IT must rebuild, convert, or reconfigure the data that resides on the system? With Notes/Domino, that has never been a problem, and it's Lotus' goal to keep that problem from ever occurring.

Portability is Essential

In Lotus' view, software portability—between operating systems and/or version releases—is not a widget that is added as an afterthought. It is an integral element in the design of the software that must be architected from the moment that a piece of middleware is conceived.

Just as the System i has managed to succeed over the years because it has used the TIMI to abstract the application layer of software and the operating system away from the instruction set of the hardware, so too has Lotus abstracted the application layers of Notes/Domino away from the hardware instruction set for individual release variations. Applications that were written for one piece of hardware using Lotus Notes/Domino should be, in most cases, transportable to the next hardware or software release without modification, conversion, or alteration.

We in the System i SMB sector all know from experience that this is not necessarily true of Windows, AIX, Linux, or other operating system platforms. It's one of the things that has made the architecture of the System/38 transform successfully into the architecture of the AS/400, iSeries, and System i, and it's what we have come to expect with all our applications.

In a like manner, the Lotus Notes/Domino NSF database structure has abstracted the application layer from the underlying operating system so that an NSF file created on one operating system—in one point in time—may be easily ported without modification to another operating system running a newer version of Notes/Domino. We would expect no less from IBM Lotus than we do from IBM System i.

But if we begin demanding that Lotus modify the structure of Notes/Domino so that it might interact with a new, unique System i instruction set, wouldn't we be destroying the very nature of portability that we demand?

Moreover, wouldn't it be ironic if, in trying to force this artificial compliance, we stole the developmental resources of both the System i team and the Lotus team for a very limited advantage of System i functionality?

That is, in my opinion, the nature of this tempest in a teapot: To force Lotus to build a Domino DB2 data store for the System i would run counter to the value of Notes/Domino portability and platform-independence.

Thomas M. Stockwell is Editor in Chief of MC Press Online, LP.

BLOG COMMENTS POWERED BY DISQUS