Oh, and where's my TechTip?
I want to accomplish two things with this inaugural TechTip. Mainly, of course, I want to give you what you came here for, a TechTip, and we'll get to that shortly. Secondly, I want to lay out the course that this column will take so you will know what to expect and what I will be talking about here.
If you are not an Informix user currently, don't go away just yet. If you are not an AIX user but stumbled upon us, stay a while; you may like what you have found. This is, in specific, a column about using IBM's Informix Dynamic Server RDBMS on AIX, but more generally, it is about becoming better at using SQL (pronounced "Ess Kew El"; Sequel is another language and a topic for another article) running on a UNIX operating platform. Because of this fact, much of what you will read here will apply to you even if you are using MySQL on Linux or Oracle on another proprietary UNIX. Get comfy.
For those of you who have never heard of it and yet are still with me here, what is Informix? Informix was one of the early relational database systems that popped into being in the early 1980s. In the early 1990s, Informix introduced a completely revamped server technology that it called Informix Dynamic Server, or IDS. By the turn of the 21st Century, IDS had garnered a strong and loyal following among developers and database administrators for its speed, reliability, and low cost of ownership. When IBM purchased the technology in 2001, everyone prophesied the end of Informix. That did not happen. In the 11 years that IBM has owned IDS, they have published more releases and a richer suite of enhancements to the product than Informix Corp. did in the product's first 18 years.
Today, Informix Dynamic Server is IBM's go-to engine for the tough jobs. Need an enterprise-class database server that can be squeezed down to run inside a phone system and never be touched by human hands? Informix. Need a server that can natively handle massive streams of time-stamped data in real time, store it using one-third of the disk space your competitor will have to pay for, and present it to purely SQL-based tools? Informix. Need to build an OLTP application that can kick butt at transaction rates that will make your competitors' heads spin right off their shoulders? Build it over Informix.
This is starting to sound like a sales pitch, and I don't work for IBM, so let's move on to our tip.
I'm frequently asked about specific features offered by one RDBMS or another with reference to how one can accomplish the same thing in an Informix environment. From time to time, I'll cover one of these questions. This time, let's talk about hash indexing. Traditional btree/b+tree indexes are very good at processing range filters. A range filter is one that specifies a range of values for a given column that will satify the query. Here are a couple of examples:
- ... and salary between 90000 and 120000 (all records with salary greater than 90,000 and less than or equal to 120,000)
- ... and sales > 100 (all rows with sales above 100)
A btree index can easily find all of the rows in the match range. However, if you have an equality filter (say "and department_num = 42"), there are faster technologies than a btree index. One such is a hash table or hash index. The "hash" part of this technology is the application of a mathematical function that maps the values being hashed into a finite number of slots or buckets. A simple hash would be the modulo function. When you create a hash index, you select the hash function so that a relatively small number of possible column values will hash to a given hash value, making it very fast to find the records that contain a given value. The downside is that you cannot use a hash index for range filters, and many database systems cannot combine hash indexes with btree indexes to process multiple filters.
With the release of Informix version 11.70 last year, IBM introduced a new index type known as a Forest of Trees (FOT) index. It is a hybrid of hash and btree technology. An FOT index defines a key that begins with one or more columns to which a hash into a user-defined number of hash buckets is applied and then continues with a list of other key columns. Each hash bucket identifies a btree index containing the remaining key columns that follow the hashed portion of the key. If you do not specify any other columns after the hashed portion of the key, the index devolves into a hash index. If you have additional columns in the key, the FOT index experiences far less contention for access to btree root nodes than a traditional btree index, allowing more concurrency for large numbers of users. Each of the btrees in an FOT index will also be flatter than simple btree on the full key would be, resulting in fewer index accesses to get to the correct leaf. Since the non-hashed portion of the key is indexed by a btree, range filters on these columns are supported.
Informix v11.70 also supports a feature called Index Self-Join (I-SJ), which permits the optimizer to use an index to filter on key columns that are not the leading column(s) of the index. That can permit you to reduce the number of indexes on the table, thereby improving insert, update, and delete performance. In the case of FOT indexes, I-SJ means that Informix can use the btree sub-indexes of the FOT index as if you also had a separate index on just those columns.
You create a Forest of Trees index just like any other index but with an additional clause that defines the hash:
CREATE INDEX mytable_ix12 ON mytable( col1, col2, col3, col4 )
HASH ON (col1, col2) WITH 100 BUCKETS;
This creates one hundred small btree indexes each for a subset of the compound col1/col2 key that contain keys for col3 and col4. Searching with equality filters on col1 and col2 will find all of the rows with that compound value very quickly. It's noticably faster than searching a normal btree, and the search will be even faster if you also have filters on col3 and/or col4.