18
Thu, Apr
5 New Articles

Relational Database Design -- Part 1

Change Management
Typography
  • Smaller Small Medium Big Bigger
  • Default Helvetica Segoe Georgia Times

More and more, a company's operational and strategic work depends on data processing. It is now apparent that data is itself an important company resource. Since a company's information on its products, customers, suppliers, employees, competitors, and so on can provide a significant competitive advantage, we must start to manage data as a company resource, while at the same time improve the way it is handled relative to programming. It becomes clear that some sort of uniform system is needed. A relational database environment, with its supporting software and accompanying management techniques, provides the framework for treating data as a standardized, manageable, shareable resource.

The Good Old Days

We still tend to concentrate on the how (code), as opposed to the what (data and the structure of that data). Programming (coding) is the fun part for most of us; designing an application, especially the database, seems too much like work. The majority of us began our programming lives on a flat file system, the S/34 or S/36. Little, if any, thought was given to the design of our files -- the code was what mattered. With the advent of the S/38, and more recently the AS/400, we were given a built-in relational database. But all those years of thinking about the code and not the data have taken their toll. Even today, when we are all supposed to know better, most of our time and emphasis (when it comes to applications development) is still spent on the code.

MIS departments are an ever-expanding entity. Rules change as data processing needs grow. It becomes apparent that in order to keep up with large backlogs of requests for new applications and enhancements to existing applications, application development has to take on a more standard, structured form. The amount of time spent maintaining existing applications becomes unacceptable as we attempt to keep ahead of these requests. It's clear that the way we handle -- or mishandle -- data is a major factor in the application maintenance mess in which we find ourselves.

There are numerous ways the design of our data files impacts new development and maintenance to existing applications. Most shops have one or more of the following problems somewhere in their system:

o Different files store the same data in different formats.

o Different applications store the same data is different files. This stores data redundantly in several files.

o Some programs are written in such a manner that if the way that the data is stored changes, the programs that access that data have to be modified in order to continue working correctly.

o Data is all too often inaccurate, inconsistent, or not up to date. Changes in everything from access methods to tax tables require programming changes.

Most data processing departments find it difficult to maintain standards in both programming and data storage. Programmers tend to go their separate ways when other programs are not directly dependent on their work. Unfortunately, a lack of standardization makes it difficult for anyone -- often even the original programmer -- to understand a program well enough to make required maintenance changes to it later on.

Setting Up Standards

A careful and standardized look at the entire application development process is required when different applications share data files. We need to use consistent data descriptions and access methods in our programs that affect the database and keep in mind that whatever one programmer does affects others, too. Standardization in programming techniques and storage schemes only serves to make for a less error-prone environment. This won't happen overnight, but rather takes place as a gradual transition.

In the independent flat file data processing environment, choice of file organization and access method usually decides the way that the application program is written. The program logic is dependent on the way in which the data is stored. If for any reason the storage characteristics of the data change, the program must be modified, often extensively. This should be avoided, due to the time and expense involved in such efforts. There are many reasons file structures should change to reflect improved storage techniques: new hardware, new software and performance tuning, to name just a few. One reason for the massive amounts of time spent on program maintenance is the typical level of dependency between programs and data.

What we would like to be able to do is separate storage and access considerations from programming as much as possible. The AS/400's Database Management System (DBMS) is a major step in the right direction in solving certain aspects of this problem.

The use of independent flat files should be looked at as both a starting point and a stopgap measure. It is clear that the independent flat file method of data storage, combined with individual applications implemented in a vacuum, carries with it a host of undesirable side effects. The solution, as we shall see more and more, is the database approach.

A Relational Database

Dr. Edgar F. Codd of IBM published a paper in 1970 entitled, "A Relational Model of Data for Large Shared Data Banks." This paper was the beginning of the relational database.

A company needs to keep track of many different pieces of information. Among these are employees, inventory, financial statistics, etc. All of this data must be organized in a manner that allows for fast easy retrieval, while at the same time avoids redundant storage. At a conceptual level, a database is a collection of this information. It's a collection of the data concerning a real world facet of the company.

The Parts of a Relational Database

A relational database is a DBMS implementation that is made up of relations (files), also known as tables. These tables are implemented as flat files, but through the use of keys are combined together to give very complex and sophisticated views of the data. A graphic representation of the parts of a relational database is shown in 1. A relation is made up of tuples (records, rows). A tuple is the description of a particular set of facts in a relational database. For example, an employee master record, a magazine subscriber record. While a relation is a description of a group of entities, a tuple is the description of a particular entity. Tuples are made up of attributes (fields, columns). An attribute is a description of one particular fact concerning a tuple. Just as a tuple is a description of a particular entity, an attribute is a description of one fact about the entity. For example, an employee's social security number, a magazine subscriber's address. Each attribute has a domain. The domain of an attribute is the allowable values of that attribute, e.g., an employee's salary must be numeric and be in the range of $1 to $100. From this point on, when referring to a relation (file) I will use the term table, when referring to a tuple (record) I will use the term row and when referring to an attribute (field) I will use the term column.

A relational database is a DBMS implementation that is made up of relations (files), also known as tables. These tables are implemented as flat files, but through the use of keys are combined together to give very complex and sophisticated views of the data. A graphic representation of the parts of a relational database is shown in Figure 1. A relation is made up of tuples (records, rows). A tuple is the description of a particular set of facts in a relational database. For example, an employee master record, a magazine subscriber record. While a relation is a description of a group of entities, a tuple is the description of a particular entity. Tuples are made up of attributes (fields, columns). An attribute is a description of one particular fact concerning a tuple. Just as a tuple is a description of a particular entity, an attribute is a description of one fact about the entity. For example, an employee's social security number, a magazine subscriber's address. Each attribute has a domain. The domain of an attribute is the allowable values of that attribute, e.g., an employee's salary must be numeric and be in the range of $1 to $100. From this point on, when referring to a relation (file) I will use the term table, when referring to a tuple (record) I will use the term row and when referring to an attribute (field) I will use the term column.

Uniquely Qualified

One requirement of relational databases is that each row in a table must be unique. Each row must be identified by a unique set of columns, known as a key.

There may be many combinations of columns that can uniquely identify a row, these are known as candidate keys. The primary key is the one candidate key chosen to function as the main identifier for the rows of the table. The key of one table can appear in another table, this is know as a foreign key. A foreign key is a column or columns of one table in a database that is the primary key of another table in the database. This foreign key is what allows one table to relate to another table in a database, hence the term relational database. One particular issue that foreign keys are closely tied to is referential integrity. Referential integrity is the issue of whether a data value that serves as the foreign key of a row in one table has a matching primary key value in another table.

Putting it all Together

A relational database must be capable of certain operations in order to qualify for the name relational. Among these operations are three very important ones: Select, Project and Join.

The Select operation allows us to retrieve a table that is the subset of another table. When we select a table, we take a horizontal piece of the table. In other words, we take selected rows from a table. For example, select all employees that have a salary greater that $1,000 an hour. This would retrieve a table that contain only the employees that meet the selection criteria: salary greater than $1,000. On the AS/400 there are many ways to achieve this: logical files, OPNQRYF, SQL/400, etc.

To Project a table, we take only selected columns from the table. When we project a table, we take a vertical piece of the table. In other words, we take selected columns from a row. For instance, retrieve the salary of all employees. This would produce a table that contains only the salary column. When I say that these operations produce a table, it doesn't mean that this table is a permanent table. It means that to an application that result appears as a table, but this table may only exist for that particular run of the program.

The Join operation is the most important and complex of the operations in a relational database. A join allows for the implementation of the purpose of a relational database, the relating of two or more tables. A join is the merging of related records from two or more files using keys as the link. For example, joining the employee master table with the employee insurance table using the column employee number as the join column. In the employee master table, the column employee number is the primary key and in the employee insurance file, the column employee number is a foreign key.

These operations can be combined in many different ways to produce very complicated and sophisticated views of your data. For instance, join the employee master file with the employee insurance file, selecting only those employees that have been employed more than two years, giving their name, insurance type and years of service.

Relationships

The ways in which different entities relate to each other are known as associations or relationships. These relationships are show in 2. Unary associations are when one entity relates to only one other entity, such as an employee and the employee's social security number. There is a one for one (unary) association in each direction: one employee, one social security number. This is also known as a one-to-one relationship.

The ways in which different entities relate to each other are known as associations or relationships. These relationships are show in Figure 2. Unary associations are when one entity relates to only one other entity, such as an employee and the employee's social security number. There is a one for one (unary) association in each direction: one employee, one social security number. This is also known as a one-to-one relationship.

A relationship can also exist that has unary association in one direction and a multiple association in the other direction. This is known as a one-to-many relationship. An example would be the association between an employee and the company. An employee works for one company but the company has many employees working for it. The final type of relationship has multiple associations in both directions. This is known as a many-to-many relationship. An example would be users and AS/400 libraries. There are many users authorized to a library and many libraries authorized to a user.

Data Redundancy

Data stored in a data processing system mirrors the realities of a business. In a data processing system, data is redundant if a particular fact about the real world is stored more than once. Let's take a look at the associated problems of redundant data and data integrity. First we will consider them in the context of one file, then in the context of many files. A example of nonredundant and redundant tables is shown in 3.

Data stored in a data processing system mirrors the realities of a business. In a data processing system, data is redundant if a particular fact about the real world is stored more than once. Let's take a look at the associated problems of redundant data and data integrity. First we will consider them in the context of one file, then in the context of many files. A example of nonredundant and redundant tables is shown in Figure 3.

Suppose that we are a health insurance company which must keep track of our accounts and agents. The Agent file is keyed on the unique column, Agent Number, and includes, as one of its other fields, the column, Agent Name. The Account file is keyed on the unique column, Account Number, and has, as one of its other fields, the Agent Number, (the number of the agent who services this account). There is one record per agent in the Agent file and each record contains information about a particular agent, such as agent number, name, location, telephone extension, and so on. There's a similar situation for the data on each account in the Account file.

This kind of data storage is said to be nonredundant, because the detail data on each entity is stored only once. If someone comes along and wants to know the name of the agent who is identified by a particular agent number, they merely look up that agent's record in the Agent file, using Agent Number as the key, and read the name. Similarly, if someone wants to know the number of the agent who is responsible for a particular account, that person may access the Account file and look up the account's record, using the Account Number as the key. So far, so good.

Now, what if someone comes along and wants to know the name of the agent who services a particular account, identified by account number. Can that information be obtained from only one of the files? Clearly it cannot, since the account number information exists only in the Account file, and the agent name information exists only in the Agent file. The only way to answer this question is to first look up the account's record in the Account file, find the number of the agent on that account, use that number to find the person's record in the Agent file, and then finally find his or her name in the record. That kind of custom-made, multi-command, multifile access is error-prone in terms of programming, and expensive in terms of execution performance.

Ah, but if we knew that we were going to make such queries, why did we bother breaking the data up into two files in the first place? If it were all contained in one file, in a meaningful way, then there would be no costly multi-command, multifile accesses. Notice that we show the agent assigned to a particular account, just as we did in the two files, but now the agent's other information is also carried along. We can still answer the two original simple questions that we put to the files, but now, in addition, we can answer the tougher, previously multifile question, "What is the name of the agent on a particular account?" with just the one file and one command. Fine! Or is it?

In this combined file, where all of the fundamentally different kinds of information are intermingled, the particulars of a specific agent must be repeated for every account serviced.

This situation of data redundancy causes a number of problems. For one thing, the data takes up a lot more storage space. For another thing, it causes extra work to update. If an agent's name changes, the change must be recorded many times. The time consumed in doing this can be very costly, particularly if the records involved are scattered all over a disk. Also, there is the nagging doubt that each of the occurrences was located and updated, due to anything from programming bugs to execution time system failures. When that happens, it's called a data integrity problem.

There appears to be a trade-off, but trade-offs are hardly rare in the computer field. In the two files, we have no redundancy, but a query of the type that we've been looking at requires a multi-command, multifile access. In the combined file, we have eliminated the need for a multi-command, multifile access for that type of query, but we have introduced data redundancy. Neither of these situations is tolerable, for the reasons indicated previously.

Can we have our cake and eat it, too? Yes, and this question brings us to one of the key features of the database environment. A true integrated data management system; that is, a true database system, is one in which data can be held nonredundantly. It also allows for a multifile query that requires a mixture of different kinds of data to be specified in a single command from the highest level programming interface. Any system that does not have this property really should not be called a database system -- a "file management" system would be a more accurate term.

Another benefit of the integrated approach is the encouragement that it gives users and programmers to find new uses for the data in the database. The richness of having so many different portions of the organization's data interconnected leads to many more usage possibilities than when it was stored separately.

Departmental Data

There is another kind of redundancy that, while perhaps a bit more obvious than the one previously discussed, is not necessarily easier to combat, but is difficult for different reasons.

Certain data is needed by various departments in a company during the course of normal, everyday work. For example, account name address may be needed by the sales department, the accounts receivable department and the credit department. There will also generally be other pieces of information involving accounts that are needed by only one or two departments. Typically the solution to this multiple need is redundancy. The sales department (or accounts receivable and credit departments) has its own stored file that, among other things, contains the account name and address.

While it's true that this keeps various departments from competing for the data, it opens up a can of worms reminiscent of the problems that we encountered in single file redundancy. First of all, storing a piece of data three times takes three times the amount of storage as storing it once. In addition, every time the data must be updated, it must be changed in each file in which it resides. This wastes processing time, causes control nightmares and can potentially lead to integrity problems if all of the data is not updated correctly and at the same time. Furthermore, the question arises as to whose responsibility it is to update data in all of the different files -- files that are probably owned by widely separated departments.

Wouldn't it make more sense for such data to be stored just once, in a way that makes it accessible to all of the different groups that need it? This way, there would be one group in charge of updating it, little wasted space, few integrity problems, and so on.

The database approach includes the concept of a common shared database with controlled management and update responsibility.

Summing It All Up

A relational database offers many advantages over the traditional methods of data storage. It allows for the storage of complex pieces of data concerning the real world activities of a company in a flexible, nonredundant manner. It also provides a set of operations that allows that data to be accessed in a myriad of ways. This approach enables a company to look at its data in ways that were never before possible. With a relational approach to data storage, a company can begin to manage its data the same way it manages its inventory - - as an asset and not a liability.


Relational Database Design -- Part 1

Figure 1 Database components

 Figure 1: Database Components DATABASE RELATION, TABLE, FILE TUPLE, ROW, RECORD ATTRIBUTE, COLUMN, FIELD DOMAIN 
Relational Database Design -- Part 1

Figure 2 Entity relationships

 Figure 2: Entity Relationships ONE <--------> ONE ONE <-------->> MANY MANY<<-------->> MANY 
Relational Database Design -- Part 1

Figure 3 Redundant vs Nonredundant tables

 Figure 3: Redundant vs. Nonredundant Tables NONREDUNDANT: Account Table: Account Account Account Agent Number Name Type Number Agent Table: Agent Agent Agent Agent Number Name Extension Location REDUNDANT: Account/Agent Table: Account Account Account Agent Agent Agent Number Name Type Name Extension Location 
BLOG COMMENTS POWERED BY DISQUS

LATEST COMMENTS

Support MC Press Online

$0.00 Raised:
$

Book Reviews

Resource Center

  • SB Profound WC 5536 Have you been wondering about Node.js? Our free Node.js Webinar Series takes you from total beginner to creating a fully-functional IBM i Node.js business application. You can find Part 1 here. In Part 2 of our free Node.js Webinar Series, Brian May teaches you the different tooling options available for writing code, debugging, and using Git for version control. Brian will briefly discuss the different tools available, and demonstrate his preferred setup for Node development on IBM i or any platform. Attend this webinar to learn:

  • SB Profound WP 5539More than ever, there is a demand for IT to deliver innovation. Your IBM i has been an essential part of your business operations for years. However, your organization may struggle to maintain the current system and implement new projects. The thousands of customers we've worked with and surveyed state that expectations regarding the digital footprint and vision of the company are not aligned with the current IT environment.

  • SB HelpSystems ROBOT Generic IBM announced the E1080 servers using the latest Power10 processor in September 2021. The most powerful processor from IBM to date, Power10 is designed to handle the demands of doing business in today’s high-tech atmosphere, including running cloud applications, supporting big data, and managing AI workloads. But what does Power10 mean for your data center? In this recorded webinar, IBMers Dan Sundt and Dylan Boday join IBM Power Champion Tom Huntington for a discussion on why Power10 technology is the right strategic investment if you run IBM i, AIX, or Linux. In this action-packed hour, Tom will share trends from the IBM i and AIX user communities while Dan and Dylan dive into the tech specs for key hardware, including:

  • Magic MarkTRY the one package that solves all your document design and printing challenges on all your platforms. Produce bar code labels, electronic forms, ad hoc reports, and RFID tags – without programming! MarkMagic is the only document design and print solution that combines report writing, WYSIWYG label and forms design, and conditional printing in one integrated product. Make sure your data survives when catastrophe hits. Request your trial now!  Request Now.

  • SB HelpSystems ROBOT GenericForms of ransomware has been around for over 30 years, and with more and more organizations suffering attacks each year, it continues to endure. What has made ransomware such a durable threat and what is the best way to combat it? In order to prevent ransomware, organizations must first understand how it works.

  • SB HelpSystems ROBOT GenericIT security is a top priority for businesses around the world, but most IBM i pros don’t know where to begin—and most cybersecurity experts don’t know IBM i. In this session, Robin Tatam explores the business impact of lax IBM i security, the top vulnerabilities putting IBM i at risk, and the steps you can take to protect your organization. If you’re looking to avoid unexpected downtime or corrupted data, you don’t want to miss this session.

  • SB HelpSystems ROBOT GenericCan you trust all of your users all of the time? A typical end user receives 16 malicious emails each month, but only 17 percent of these phishing campaigns are reported to IT. Once an attack is underway, most organizations won’t discover the breach until six months later. A staggering amount of damage can occur in that time. Despite these risks, 93 percent of organizations are leaving their IBM i systems vulnerable to cybercrime. In this on-demand webinar, IBM i security experts Robin Tatam and Sandi Moore will reveal:

  • FORTRA Disaster protection is vital to every business. Yet, it often consists of patched together procedures that are prone to error. From automatic backups to data encryption to media management, Robot automates the routine (yet often complex) tasks of iSeries backup and recovery, saving you time and money and making the process safer and more reliable. Automate your backups with the Robot Backup and Recovery Solution. Key features include:

  • FORTRAManaging messages on your IBM i can be more than a full-time job if you have to do it manually. Messages need a response and resources must be monitored—often over multiple systems and across platforms. How can you be sure you won’t miss important system events? Automate your message center with the Robot Message Management Solution. Key features include:

  • FORTRAThe thought of printing, distributing, and storing iSeries reports manually may reduce you to tears. Paper and labor costs associated with report generation can spiral out of control. Mountains of paper threaten to swamp your files. Robot automates report bursting, distribution, bundling, and archiving, and offers secure, selective online report viewing. Manage your reports with the Robot Report Management Solution. Key features include:

  • FORTRAFor over 30 years, Robot has been a leader in systems management for IBM i. With batch job creation and scheduling at its core, the Robot Job Scheduling Solution reduces the opportunity for human error and helps you maintain service levels, automating even the biggest, most complex runbooks. Manage your job schedule with the Robot Job Scheduling Solution. Key features include:

  • LANSA Business users want new applications now. Market and regulatory pressures require faster application updates and delivery into production. Your IBM i developers may be approaching retirement, and you see no sure way to fill their positions with experienced developers. In addition, you may be caught between maintaining your existing applications and the uncertainty of moving to something new.

  • LANSAWhen it comes to creating your business applications, there are hundreds of coding platforms and programming languages to choose from. These options range from very complex traditional programming languages to Low-Code platforms where sometimes no traditional coding experience is needed. Download our whitepaper, The Power of Writing Code in a Low-Code Solution, and:

  • LANSASupply Chain is becoming increasingly complex and unpredictable. From raw materials for manufacturing to food supply chains, the journey from source to production to delivery to consumers is marred with inefficiencies, manual processes, shortages, recalls, counterfeits, and scandals. In this webinar, we discuss how:

  • The MC Resource Centers bring you the widest selection of white papers, trial software, and on-demand webcasts for you to choose from. >> Review the list of White Papers, Trial Software or On-Demand Webcast at the MC Press Resource Center. >> Add the items to yru Cart and complet he checkout process and submit

  • Profound Logic Have you been wondering about Node.js? Our free Node.js Webinar Series takes you from total beginner to creating a fully-functional IBM i Node.js business application.

  • SB Profound WC 5536Join us for this hour-long webcast that will explore:

  • Fortra IT managers hoping to find new IBM i talent are discovering that the pool of experienced RPG programmers and operators or administrators with intimate knowledge of the operating system and the applications that run on it is small. This begs the question: How will you manage the platform that supports such a big part of your business? This guide offers strategies and software suggestions to help you plan IT staffing and resources and smooth the transition after your AS/400 talent retires. Read on to learn: