Focus on Data Warehousing

Business Intelligence
  • Smaller Small Medium Big Bigger
  • Default Helvetica Segoe Georgia Times

William Inmon is credited as the “father of data warehousing” because he wrote a book called Building the Data Warehouse in 1993. In this book, he defined a data warehouse as “a subject oriented, integrated, non-volatile, time variant collection of data in support of management’s decisions.”

Anyone who’s been in the information processing industry for any significant amount of time knows that there are two types of data—the real-time production data representing the current state of a business and the historical data by which many decisions are made. Inmon’s contribution to the world of computing was the assertion that these different types of data require different types of storage structures and retrieval mechanisms.

If Inmon is the father of data warehousing, then his ancestors are the many programmers who have worked miracles with indexed files, programs, and query tools through the years. I’ve worked with many applications that included summary files, most of them updated from transaction files during nightly processing. Managers have relied on these summary files more than on the raw production data to make decisions. Just think, you may be Inmon’s grandfather or grandmother, sort of.

Summary files were a sort of poor man’s data warehouse. They were separate from the production data, they typically included “buckets” (fields representing periods of time), and they were designed to answer specific questions.

This was not easy work. I told one of my bosses years ago that he was mistaken in thinking that he needed a programmer on his payroll. What he really needed, I informed him, was a magician. Nevertheless, IS departments did a good job of providing managers with the information they needed. Data warehousing can help IS departments do an even better job.

There has never been a better time to implement a data warehouse. The first reason is that software tools are better than ever. Many people have put in many hours writing programs to replicate, cleanse, load, and analyze data. RPG and COBOL programmers no longer have to write programs to do these functions.

Second, consultants have gotten better at setting up data warehouses. Other people have paid them to learn and make mistakes, and you get the benefits. An experienced consultant can probably tell you not only what it will cost to implement a data warehouse but also what sort of return you can expect on your investment.

Third, you can expect a positive return on your investment. Like any other type of project, some data warehousing projects fail. However, a study by International Data Corporation (IDC) claims the average ROI of a data warehousing project is 401 percent over a three-year period. This figure is cited numerous places on the Web. One of these is in IBM’s publication A Practical Guide to Getting Started with Data Warehousing at com/data/busn-intel/dwatgbroch/. This publication has other useful information, including case studies and a glossary of terms. The language is slightly stilted in places, but don’t let that stop you from gleaning some good information.

IDC found three ways that data warehouses pay off. First, the older decision support systems are thrown away, and the resources they occupy become available for other uses. Second, data warehouses require minimal IT support. Third, managers make decisions based on information from the data warehouse, information that is not available from other sources. It is this third factor that results in the greatest return.

If you’ve never implemented a data warehouse, keep in mind that you don’t have to go it alone. Besides consultants, there are plenty of books and magazine articles to help you. Much of this material is available free on the Web. Of course, your colleagues who are involved in data warehousing are another source of information.

I hope the data warehousing articles in this month’s Midrange Computing and in future issues prove helpful to you.