20
Sat, Apr
5 New Articles

Putting Business Intelligence at Your Fingertips, Part I

Typography
  • Smaller Small Medium Big Bigger
  • Default Helvetica Segoe Georgia Times
  • Allow for rapid growth and the rapid addition of new "solutions"
  • Create an architecture in which the business entities and rules are adhered to across "consumers"
  • Ensure "one version of the truth"
  • Add value to the business rapidly

Let's begin with a few simple definitions.

  • Business intelligence (BI)—A business management term referring to applications and technologies that are used to gather, provide access to, and analyze data and information about company operations.
  • Data warehouse (DW)—The main repository of an organization's historical data, containing the raw material for decision-support systems. Furthermore, the repository is subject-oriented, time-variant, non-volatile, and integrated to support strategic analysis and reporting.
  • Data mart—A specialized version of a data warehouse predicated on a specific, predefined need for a certain grouping and configuration of select data. A data mart emphasizes easy access to relevant information.
  • Operational data store (ODS)—A repository of an organization's transactional data, containing the raw material for decision-support systems. Furthermore, the repository is subject-oriented, volatile, and integrated in support of tactical analysis and reporting.
  • Extract/Transform/Load (ETL)—A process in data warehousing that involves extracting data from outside sources, transforming it to fit business needs, and loading it into the data warehouse.
  • Online Transaction Processing (OLTP)—A class of systems that facilitate and manage transaction-oriented applications, typically for data entry and retrieval transaction processing. Sometimes referred to as the line-of-business "operational system."
  • Online Analytical Processing (OLAP)—An approach to quickly providing answers to analytical queries that are multidimensional in nature; for example, sales figures by customers located in Minnesota for the year 2007.

What the BI Architecture Often Looks Like

When first implementing a BI application, many companies focus only on getting data out of the OLTP system and placing it into a data mart as quickly and easily as possible, totally ignoring a broader architecture that supports growth and/or the addition of more data marts and consumers. This quick and dirty solution usually consists of extracting transactional data from the operational system on a periodic basis, performing some transformations and propagating the data to the data mart (see Figure 1).

This is sufficient and acceptable if no other data marts are required. Experience tells us that this is rarely the case.

http://www.mcpressonline.com/articles/images/2002/Business%20Intelligence%20V3--10240700.png

Figure 1: A simple BI architecture with only one data mart benefits only a few users. (Click images to enlarge.)

Once the word is out that the first data mart is operational and consumers are receiving benefit from the new source of information, additional data marts will be requested. Of course, the seemingly easiest way to provide the new marts will be to replicate the same design and implementation that was used for the first mart. This approach is shown in Figure 2.

http://www.mcpressonline.com/articles/images/2002/Business%20Intelligence%20V3--10240701.png

Figure 2: A BI architecture with multiple data marts is better but still not as good as it could be.

Herein lies the problem. This "ad hoc" methodology of creating data marts without an overall architecture has disadvantages:

  • ETL and cleansing of data requires too many processes.
  • Duplicate processing burdens the operational system.
  • You're likely to have inconsistent data across data marts.
  • Additional data marts are increasingly more expensive to create.

Of course, there are some advantages as well, the largest being that you will not have to get the various data mart users to agree on common business entities as each consumer will have their own version of the truth!

What the BI Architecture Should Look Like

A more robust business intelligence architecture will make use of a "data warehouse," a structure that acts as an independent repository for data that is used for strategic analysis, query, and reporting. Furthermore, the architecture might also include an operational data store (ODS), a structure that acts as an independent repository for transactional data used for tactical query and reporting. This is illustrated in Figure 3.

http://www.mcpressonline.com/articles/images/2002/Business%20Intelligence%20V3--10240702.png

Figure 3: The recommended BI architecture includes a data warehouse and an ODS.

There are inherent advantages in implementing BI solutions using this architecture:

  • Extraction, cleansing, and transformation done once for all data
  • Rapid data mart creation
  • Consistent business entities across all data marts

Additional data marts can source their data from the data warehouse, which helps ensure consistent processing for all data—i.e., one version of the truth. As a new data mart is designed, you will find that much of the data required for it will already be in the warehouse. In other words, by eliminating duplicate processing and pulling the data once from the operational systems, undue work is minimized, making the whole system more efficient.

Of course, one of the disadvantages of using such a robust architecture is that all data mart users will need to agree on consistent definitions of business entities and rules. This will result in more work for the warehouse architects, work that will be both technical and political in nature.

Warning! When laying out a BI architecture, be prepared for complaints such as these:

  • "It will take forever to model the whole business. We won't ever finish the warehouse."
  • "The warehouse will contain data that's not used. Just think of all the extra disk space that will be needed."
  • "I can implement the data mart by itself a lot faster."

To counter these complains, remember that the modeling activity includes only business entities for the data mart that is in progress, not the entire enterprise. By using the focused data model, only the data required to fulfill the current requirement is extracted from the operational system and placed into the data warehouse. And yes, it is very possible that there will be a lot of data stored in the data warehouse and data marts, especially considering that the requirements placed on the business intelligence system can be different from those of the transactional system. For example, the OLTP system might require only two years worth of data, while the BI system requires five years worth of data in support of trending and macro analysis. Why burden the OLTP system with five years of data?

As for the common declaration that "I can implement the data mart by itself a lot faster"...well, we call this guerrilla data warehousing! All of the BI concepts are used— extract data, transform data, load data, store data, query data—but it's done on a personal or local basis. This promotes multiple versions of the truth. And you have to consider the security issues surrounding the storage of vital business data in disparate locations around the enterprise. We are certain that every business has hundreds of Excel spreadsheets scattered around, masquerading as data marts.

What about the idea of "virtual data warehousing"? Can we just query the operational data directly using complex SQL requests? While this is certainly possible given the latest advances in database management system technology as well as the high-performance systems available, it is not recommended. In our experience, asking the operational system to meet the requirements of both a transactional system and business intelligence system is a tall order. At some point, the requirements will diverge and it will be more efficient and economical to have separate systems.

Based on the BI architecture described herein, let's look at one methodology you can use to design and implement the solution. Again, this represents just one methodology.

Overall Steps in the Methodology

This methodology includes the various steps needed to design, build, test, and monitor the BI system (Figure 4).

http://www.mcpressonline.com/articles/images/2002/Business%20Intelligence%20V3--10240703.png

Figure 4: Follow these steps to design and implement the recommended solution.

When designing and building the system, you can combine some steps or do them in parallel, while other steps will be dependent on previous steps. Many of the steps and sections of the methodology are iterative in nature, meaning that as you go through the process, it might be advantageous or required to go back and do the series of steps again, taking into account newfound information. Just be aware that activities or conclusions in one step can lead you back to others. Designing and building a business intelligence system is a process of discovery. Some of the discoveries will be used to iterate through the design and implementation of the solution. Other discoveries will be categorized and used as input for the next project.

It is also important to realize that good project management and some technical education will be required throughout all phases. And remember, the knowledge and experience gained will be utilized during the next round of data marts.

Define Requirements

Properly defining, identifying, and understanding the requirements is the most important step. Given that both business requirements and technical requirements are involved, participants in this activity include business leaders, technical leaders, key end users, and a decision-maker. The output of this step includes the following:

  • High-level requirements, both technical and business
  • A list of potential projects, with the first project selected
  • A feasibility study
  • Identification of project participants, both technical and business
  • Project scope
  • Skills and education required to move forward

During the requirements gathering, it will be very tempting to keep working until all the requirements are identified and agreed upon. This can be a trap we call "analysis paralysis." Thus we use the 80/20 rule: Gather approximately 80 percent of the requirements and start moving forward. The other requirements will fall into place, or you will discover the real requirements in a future step.

Create/Refine Data Model

After requirements are captured, the process of creating and refining a data model can begin. A data model is an understanding of how the entities in a business are defined, what they look like, and how they relate to one another. This step allows the data warehouse project team to gain an understanding of the business rules, to control redundancy, and to come to an agreement on common definitions and entities. This process will involve both the technical and business leaders.

The output of this exercise does not have to be too detailed, nor take months of effort. Do not attempt to model the entire enterprise; only model areas and entities specific to this project's scope—i.e., this data mart.

Design Data Warehouse

Based on the requirements and high-level modeling, it is now possible to create the logical and physical database design. The design will also be based on any data mart prototyping, provided this step was done previously.

In general, a data warehouse model is based on getting data into the warehouse as quickly and efficiently as possible, storing large quantities of detailed data, and getting data out of the warehouse and into the data marts. These requirements usually lead to a more normalized data model.

The output of this step includes these items:

  • Logical data model for the data warehouse
  • Physical data model for the data warehouse (tables, columns, etc.)
  • Information for the specific data mart addressed in this project

During the data warehouse design process, consider future growth in terms of more rows a well as additional columns to support future requirements.

Select Data Mart Tools

As soon as the requirements are gathered, you can start to evaluate and select what tool the end users will use to access, analyze, and visualize the information. In other words, you can start to evaluate data mart tools. This is an exciting step, but it's also fraught with problems. The excitement results from actually seeing new and sophisticated products. The problems arise because every vendor demos well, every tool looks great, and it's easy to get confused and languish in analysis paralysis.

When evaluating end user query, analysis, and reporting products, it is very important to remember and make use of the requirements you have identified. When considering tools, keep in mind these influencing factors:

  • What type of problem are you solving (OLAP, data mining, reporting, or simple querying)?
  • What are the performance expectations?
  • What is the realistic number of concurrent users?
  • What are the various types of users (power users vs. report "lookers")?
  • What is the anticipated ease of use, ease of learning, and ease of configuration and maintenance?

Another important factor is the budget. It makes no sense to evaluate $100,000 tools when the budget is $10,000. On the other hand, if your business problem requires a $100,000 solution, then by all means get it.

Is it acceptable to change query and reporting tools or to have more than one? The answer rests in meeting requirements. If the requirements dictate more than one tool, this is acceptable. If you discover that the current tool no longer meets the requirements, it is acceptable to change. The goal is to provide business value and a good return on investment (ROI).

Design Data Mart

Once the analysis, query, and reporting tool is identified, you can work on the logical and physical design of the data mart. In many cases, the tool selected drives the design. For example, online analytical processing (OLAP) tools may lead to a star or snowflake schema or may have their own data structure not reliant upon a relational database system. In most cases, query and reporting solutions will lead you to a relational database design. Sophisticated data mining tools may lead you to something completely different in terms of data structure. The design will be dictated by the tooling and algorithms (i.e., the type of mining).

For relational database structures such as star or snowflake schemas, the data mart design process converts requirements into dimensions and facts. The process also converts data into consumable information.

Prototype Data Mart

Once the data mart is designed and the data structures are defined, you can implement a functional prototype using the tool selected. While the data model is realistic, the data doesn't have to be. Generating some test data or a subset of current data is enough to demonstrate the tooling. During the prototyping and testing, you have a first chance to get good sizing information for the data mart.

This step is very exciting yet sometimes frustrating. The end users finally get to see what they will be using, so this is an opportunity for them to provide constructive feedback. New requirements might come up, and existing requirements may be confirmed or denied. It's time to finalize requirements and avoid enlarging the scope of the current project. Unmet requirements can be captured and used as input for the next iteration or project.

This is also an opportunity to get excited about the BI project and continue to drive sponsorship and gather support.

Source Data Warehouse Data

With the data warehouse designed, it's time to determine the operational sources of the warehouse data. In other words, map source data to warehouse data. This activity includes these steps:

  • Determining the best source for each data element
  • Determining the best extraction method for the data
  • Determining what transformation and cleansing must be done to the data
  • Setting up routines/tools to extract, move, transform, and load the data
  • Defining error-handling procedures to track, report, and fix invalid data
  • Implementing appropriate measures for data-quality tracking

During this process, it is important to keep in mind performance requirements to meet the operational batch window and data warehouse refresh requirements. If these requirements were not identified in the beginning, they will come out now.

During this step, the extract, transform, load (ETL) process and tooling will be evaluated and tested. In our experience, the ETL area represents about 70 percent of the effort in implementing a BI solution.

Source Data Mart Data

Given the data mart model, it is now possible to define and design the mapping of data elements to their source in the data warehouse. During this phase, the data mart model can be refined to include summarizations and calculated fields.

Some data mart designs may require an intermediate staging or load file to be created. The loading of data mart data can be set up using similar ETL strategies as the data warehouse, allowing for automation as well as data integrity and quality tracking. The time table for loading new data will be based on the requirements gathered previously. If the users need weekly sales information, the data mart can be populated on a daily or weekly basis. On the other hand, if the users need daily sales information for yesterday, then the data mart load process must occur at least on a daily basis.

Tune Data Warehouse and Data Mart

Once the data warehouse and data mart are populated with the appropriate data, is it time to throw the BI project over the wall and let the users start working?

Of course not! One of the most important steps must occur first: namely, tuning the data warehouse and data mart. During this time, refinements can be made to ensure the data is refreshed in a timely and accurate manner. It is also important to evaluate performance of the ETL system, as well as the analysis, query, and reporting mechanisms. This will involve implementing a proper indexing strategy and possibly implementing materialized query tables. It will be very important to watch for usage behavior to determine if the initial requirements are still accurate. For example, is the data mart easy to use, and is it providing the proper information? One approach is to allow a few key users to take the solution out for a test drive. It will be advantageous to get their unbiased feedback before turning the solution over to the entire user community.

Congratulations! It's time to go live.

Monitor Usage and Maintain Data Warehouse and Data Mart

Once the BI solution is live, is the project over? Absolutely not. Monitoring and maintaining the BI solution are key to providing ongoing value and return on investment. Some of the items to monitor include these:

  • Which reports are used and not used
  • What new reports are being built
  • Which columns are used and not used
  • Which summarizations are used
  • Which summarizations are calculated at run time
  • Which queries are run most frequently
  • How performance levels are affected as more users and/or data are added to the system

As the information is consumed, new requirements will surface. The new requirements might be against the existing data mart or might necessitate a different data mart. Keeping a trained eye on the BI solution will help you to determine what to do next.

Moving on to the Next Project

As you complete the BI project and start delivering value, you will start to look to the next project. This is the natural result of success. So, what's different for the next data mart?

It's very likely the next data mart will focus on a different area of the business or enterprise. To support the new data mart, the warehouse data model may need to incorporate new entities or relationships. The data warehouse may need new tables and/or columns added, with sources identified for new data elements. Over time, more and more data will already be present in the warehouse, minimizing the need to source entirely new elements. During this process, it is important to remember to add only the data that is needed for the new mart.

Sourcing the new data mart data may reveal ETL processing that is the same as for previous marts. This allows further refinement of data warehouse design to store cleansed, transformed, and augmented data.

It is very possible that new and different analytical and visualization tools may be required for the new mart. This might be due to a different set of users or a different business problem to be solved. This is OK, and it reaffirms one of the reasons a robust BI architecture supports multiple data marts.

Some Final Thoughts...

It is said that data warehousing is a journey. This is a suitable observation, given the continual change in business goals, focus, and requirements.

Your business intelligence system is considered successful not when it's operational, but when it's used.

Mike Cain

Mike Cain is the team leader of the IBM DB2 for i Center of Excellence in Rochester, Minnesota. He can be reached at This email address is being protected from spambots. You need JavaScript enabled to view it..

BLOG COMMENTS POWERED BY DISQUS

LATEST COMMENTS

Support MC Press Online

$0.00 Raised:
$

Book Reviews

Resource Center

  • SB Profound WC 5536 Have you been wondering about Node.js? Our free Node.js Webinar Series takes you from total beginner to creating a fully-functional IBM i Node.js business application. You can find Part 1 here. In Part 2 of our free Node.js Webinar Series, Brian May teaches you the different tooling options available for writing code, debugging, and using Git for version control. Brian will briefly discuss the different tools available, and demonstrate his preferred setup for Node development on IBM i or any platform. Attend this webinar to learn:

  • SB Profound WP 5539More than ever, there is a demand for IT to deliver innovation. Your IBM i has been an essential part of your business operations for years. However, your organization may struggle to maintain the current system and implement new projects. The thousands of customers we've worked with and surveyed state that expectations regarding the digital footprint and vision of the company are not aligned with the current IT environment.

  • SB HelpSystems ROBOT Generic IBM announced the E1080 servers using the latest Power10 processor in September 2021. The most powerful processor from IBM to date, Power10 is designed to handle the demands of doing business in today’s high-tech atmosphere, including running cloud applications, supporting big data, and managing AI workloads. But what does Power10 mean for your data center? In this recorded webinar, IBMers Dan Sundt and Dylan Boday join IBM Power Champion Tom Huntington for a discussion on why Power10 technology is the right strategic investment if you run IBM i, AIX, or Linux. In this action-packed hour, Tom will share trends from the IBM i and AIX user communities while Dan and Dylan dive into the tech specs for key hardware, including:

  • Magic MarkTRY the one package that solves all your document design and printing challenges on all your platforms. Produce bar code labels, electronic forms, ad hoc reports, and RFID tags – without programming! MarkMagic is the only document design and print solution that combines report writing, WYSIWYG label and forms design, and conditional printing in one integrated product. Make sure your data survives when catastrophe hits. Request your trial now!  Request Now.

  • SB HelpSystems ROBOT GenericForms of ransomware has been around for over 30 years, and with more and more organizations suffering attacks each year, it continues to endure. What has made ransomware such a durable threat and what is the best way to combat it? In order to prevent ransomware, organizations must first understand how it works.

  • SB HelpSystems ROBOT GenericIT security is a top priority for businesses around the world, but most IBM i pros don’t know where to begin—and most cybersecurity experts don’t know IBM i. In this session, Robin Tatam explores the business impact of lax IBM i security, the top vulnerabilities putting IBM i at risk, and the steps you can take to protect your organization. If you’re looking to avoid unexpected downtime or corrupted data, you don’t want to miss this session.

  • SB HelpSystems ROBOT GenericCan you trust all of your users all of the time? A typical end user receives 16 malicious emails each month, but only 17 percent of these phishing campaigns are reported to IT. Once an attack is underway, most organizations won’t discover the breach until six months later. A staggering amount of damage can occur in that time. Despite these risks, 93 percent of organizations are leaving their IBM i systems vulnerable to cybercrime. In this on-demand webinar, IBM i security experts Robin Tatam and Sandi Moore will reveal:

  • FORTRA Disaster protection is vital to every business. Yet, it often consists of patched together procedures that are prone to error. From automatic backups to data encryption to media management, Robot automates the routine (yet often complex) tasks of iSeries backup and recovery, saving you time and money and making the process safer and more reliable. Automate your backups with the Robot Backup and Recovery Solution. Key features include:

  • FORTRAManaging messages on your IBM i can be more than a full-time job if you have to do it manually. Messages need a response and resources must be monitored—often over multiple systems and across platforms. How can you be sure you won’t miss important system events? Automate your message center with the Robot Message Management Solution. Key features include:

  • FORTRAThe thought of printing, distributing, and storing iSeries reports manually may reduce you to tears. Paper and labor costs associated with report generation can spiral out of control. Mountains of paper threaten to swamp your files. Robot automates report bursting, distribution, bundling, and archiving, and offers secure, selective online report viewing. Manage your reports with the Robot Report Management Solution. Key features include:

  • FORTRAFor over 30 years, Robot has been a leader in systems management for IBM i. With batch job creation and scheduling at its core, the Robot Job Scheduling Solution reduces the opportunity for human error and helps you maintain service levels, automating even the biggest, most complex runbooks. Manage your job schedule with the Robot Job Scheduling Solution. Key features include:

  • LANSA Business users want new applications now. Market and regulatory pressures require faster application updates and delivery into production. Your IBM i developers may be approaching retirement, and you see no sure way to fill their positions with experienced developers. In addition, you may be caught between maintaining your existing applications and the uncertainty of moving to something new.

  • LANSAWhen it comes to creating your business applications, there are hundreds of coding platforms and programming languages to choose from. These options range from very complex traditional programming languages to Low-Code platforms where sometimes no traditional coding experience is needed. Download our whitepaper, The Power of Writing Code in a Low-Code Solution, and:

  • LANSASupply Chain is becoming increasingly complex and unpredictable. From raw materials for manufacturing to food supply chains, the journey from source to production to delivery to consumers is marred with inefficiencies, manual processes, shortages, recalls, counterfeits, and scandals. In this webinar, we discuss how:

  • The MC Resource Centers bring you the widest selection of white papers, trial software, and on-demand webcasts for you to choose from. >> Review the list of White Papers, Trial Software or On-Demand Webcast at the MC Press Resource Center. >> Add the items to yru Cart and complet he checkout process and submit

  • Profound Logic Have you been wondering about Node.js? Our free Node.js Webinar Series takes you from total beginner to creating a fully-functional IBM i Node.js business application.

  • SB Profound WC 5536Join us for this hour-long webcast that will explore:

  • Fortra IT managers hoping to find new IBM i talent are discovering that the pool of experienced RPG programmers and operators or administrators with intimate knowledge of the operating system and the applications that run on it is small. This begs the question: How will you manage the platform that supports such a big part of your business? This guide offers strategies and software suggestions to help you plan IT staffing and resources and smooth the transition after your AS/400 talent retires. Read on to learn: