Practical DB2: Database Field-Naming Conventions

  • Smaller Small Medium Big Bigger
  • Default Helvetica Segoe Georgia Times

The good thing about naming conventions is that there are so many of them, and database field naming is just one of those areas of convention contention.

Put three programmers in a room to define a database and you'll end up with at least four different sets of naming conventions. And it won't be quick, either, no matter how much experience you have. I can't count the number of databases I've helped design over the years, yet I still find myself in a room sitting for hours every time I have to work on another one. Each database has its own idiosyncrasies, but a few areas are relatively common among all databases. This article will address one of the fundamental issues: field names.


Identifying the Business Requirement

Even though it seems like a pretty technical pursuit, you should think like an analyst when creating your programming conventions: Identify the business requirement, establish some success criteria, and then meet that goal. In the case of field-naming conventions, my goal is twofold: Make it easy to get data into my database, and make it easy to get data out. Naming conventions are becoming more important rather than less; increasingly, we're seeing users accessing data through ad hoc query tools, and those conventions make it easier on them as well. Today, I am going to use a very specific example to prove just how important those conventions can be. The example is quite straightforward: I'm going to add an order header record. When creating the record, I'm going to initialize several fields with values from the customer master. And while the example will be very simple, it will show how good naming conventions scale quickly and easily.



Over the years, we've seen more and more integration of SQL and traditional database I/O, but field naming is one area where the two don't seem to always mesh well. IBM has done everything it can to allow the two to live together in harmony, but real consistency requires an attention to detail that sometimes just doesn't fall within our time frame. I absolutely believe that the benefits of DDL definition over DDS definition far outweigh any drawbacks, but those benefits aren't completely risk-free. To me, the biggest problem with SQL is that it lends itself to a very ad hoc development environment, and you have to work hard to not let yourself be caught up in it. The ability to add a field named EXTRA_INFO_FOR_JANET with a simple SQL statement has some significant ramifications, and I'll address those in a follow-up article. For today, I'm going to focus on DDS environments.


To Refer or Not to Refer

Today's example will use a field reference file, albeit a very abbreviated one. I think field reference files are a critical component to any good database design, and I hope to spend more time on the concept a little later. Here's our reference file, named REFFILPF:


R RREFFIL                                          


NAME        30        TEXT('NAME')          

ORNO        10S 0      TEXT('ORDER NUMBER')  

ORDTYP      1A      TEXT('ORDER TYPE')    

PHONE        15        TEXT('PHONE NUMBER')  

ADDR        30         TEXT('ADDRESS')        

CITY        25        TEXT('CITY')          

STATE        3        TEXT('STATE')          

ZIP          9        TEXT('ZIP CODE')      

PHONE        15        TEXT('PHONE')          

EMAIL        64        TEXT('EMAIL ADDRESS')  


The field reference file defines your database attributes at a very basic level: customer number, address, phone number. The basic lengths and types are here. Other files then reference those fields in their definitions. Here's our customer master file, CUSMASPF:



R RCUSMAS                                    

CMCUST   R            REFFLD('CUST')    

CMNAME   R            REFFLD('NAME')    

CMADDR1   R            REFFLD('ADDR')    

CMADDR2   R            REFFLD('ADDR')    

CMCITY   R            REFFLD('CITY')    

CMSTATE   R            REFFLD('STATE')  

CMZIP     R            REFFLD('ZIP')    

CMPHONE   R            REFFLD('PHONE')  

CMFAX     R            REFFLD('FAX')    

CMEMAIL   R            REFFLD('EMAIL')  


You'll probably notice a couple of things right off the bat. First, I use a two-character prefix for every field. This prefix identifies the file name. As we'll see later, it's not strictly necessary, and in fact there is a school of thought that eschews prefixes. Personally, I prefer them because it allows old-school programmers to use the files without a chance of collision. There are nearly 1,000 of these prefixes, so you ought to have no problem coming up with unique identifiers for each database file. Once you've made that decision, field naming becomes quite simple: The name of the field in the database file is simply the name of the referenced field appended to the file's prefix.


You probably notice one anomaly here. While there is only one address field in the reference file, we have two address fields in the customer master. That's not a problem; we create fields CMADDR1 and CMADDR2, but both refer to the same ADDR field in the field reference file. While we try to keep the field names consistent, exceptions like this are very easy to handle. OK, this example also has an order header, so let's define that next:



R RORDHDR                                    

OHORNO   R            REFFLD('ORNO')    


OHCUST   R            REFFLD('CUST')    

OHNAME   R            REFFLD('NAME')    

OHPHONE   R            REFFLD('PHONE')  


Look closely and you'll see several fields that refer to the same fields as fields in the customer master. This is no accident; the design for this particular database calls for the customer name and phone number to be included in the order header. It may be that it has to be modified under certain circumstances, or it just may have to be there for other processing. Whatever the case, you can see that these fields will need to be populated from the corresponding fields in the customer master.


And Now for the Programming Magic

Sometimes, the preparation is more dramatic than the payoff, and this is probably one of those cases. The code that I'm going to show you is really very simple.


ctl-opt dftactgrp(*no) actgrp(*new);      


dcl-f CUSMASPF keyed;                    

dcl-f ORDHDRPF usage(*output);            


dcl-pi *n;                                

iCust like(dsORDHDR.CUST);              

iOrno like(dsORDHDR.ORNO);              



dcl-ds dsCUSMAS extname('CUSMASPF':*input)

prefix('':2) qualified;                


dcl-ds dsORDHDR extname('ORDHDRPF':*output)

prefix('':2) qualified inz;            



dsORDHDR.ORNO = iORNO;                    

chain (iCUST) CUSMASPF dsCUSMAS;          

eval-corr dsORDHDR = dsCUSMAS;            

write RORDHDR dsORDHDR;                  



The first line is the control options, nothing special there (although in a production environment you probably would be using something other than a *NEW activation group). The next two lines define the files: The customer master is input, the order header is output. The next block of code defines the parameters: The program receives a customer number and an order number. The program is supposed to create an order header for that customer. The next block of code does the setup by creating two data structures suitable for I/O. One is used to read data from the customer master, the other to write data to the order header. The trick is the use of the PREFIX('':2) keyword on both data structures. What this does is remove the first two characters of every field. Now, rather than CMCUST and OHCUST, both data structures simply have the field name CUST. This would normally cause the compiler to have some issues, so in order to avoid a collision, I also had to specify QUALIFIED on both. The one difference is that I also specified INZ on the dsORDHDR data structure; this sets any numeric fields to zeros and avoids decimal data errors.


So now that the setup is all done, the code is pretty anticlimactic. I store the order number in the order header. I then chain to the customer master. The magic is the use of the EVAL-CORR to then move all the fields from dsCUSMAS to dsORDHDR. The fields are CUST, NAME, and PHONE. Then I write the record. That's all there is to it.


Now, you might think that's an awful lot of work to avoid three EVAL statements, and you're right, it is. In this simple situation, the setup probably exceeds the savings. But the payoff comes when I decide I want to add the email address. All I do is add the field OHEMAIL to the ORDHDRPF file and recompile the program. That's it; the move happens automatically. If I need more fields, I just add them and recompile. If I need fields from another file, I just add the file and a corresponding data structure, add the chain, and add another EVAL-CORR. This technique is wonderfully scalable and frankly a lot easier than even SQL.


And it all starts from a good, solid set of field-naming conventions! Next, I'll show how to do much of this same work through DDL rather than DDS.