Only recently have companies been able to cost-effectively store massive amounts of data. Turning this data into knowledge that will give your business a strategic advantage is what business intelligence is all about. With the proper tools, this data can be used to help your company’s decision-makers predict future trends, identify opportunities, and maneuver your company in a fast-changing business environment. The strength of the AS/400’s database, DB2 Universal Database for AS/400 (DB2 UDB), makes the AS/400 an ideal platform on which to store and analyze this strategic data.
The data supporting business intelligence may be stored in several formats. One concept that is popular, especially for larger sites, is the data warehouse. A data warehouse employs a format that makes it easy for analytical tools to extract and use the information. To make massive amounts of data more accessible and faster to process, the information in the data warehouse is often presummarized or denormalized. Normalized data minimizes duplication of data and is better suited to transaction environments; denormalized data allows summarization and duplication of data, which makes this type of data easier to use in a data warehousing environment. A more focused form of data warehouses is a datamart. Datamarts are similar to data warehouses except that the information contained in a datamart typically only supports one aspect of a business. Because datamarts are more focused, they are usually easier to implement and will show a faster ROI.
There are three categories of decision support tools used in business intelligence:
• Ad hoc reporting systems—These systems support dynamic reporting and inquiries.
• Multidimensional analysis tools—Also referred to as online analytic processing (OLAP) tools, they use an OLAP server that prebuilds information into multidimensional summaries or hypercubes. Client software allows the information in these hypercubes to be viewed from many perspectives.
• Data mining tools—Data mining works backward by systematically analyzing data to identify potentially valuable relationships. The value of these relationships are then proven or discarded based on further analysis. The strength of data mining is that it can identify statistical anomalies or trends that are difficult to discern using traditional techniques.
Two categories of data are stored to support your business. The first category of data supports day-to-day operations; this data is referred to as operational data. Operational data is stored in a highly normalized format and optimized for transaction performance. In addition, because operational data supports all potential uses, it is very detailed. These factors often make it difficult or impractical to use operational data directly to support business intelligence systems, although recent improvements in indexing, processors, and storage have made it more practical for business intelligence systems to use operational data directly.
The second category of information supports business intelligence systems that retrieve and make available information about the business. We refer to this data as a data warehouse. Data warehouses contain data that has long-term strategic value. A data warehouse can also contain a massive amount of data because it contains information used to support decisions for the entire enterprise. Data stored in a data warehouse is optimized for flexibility and extraction performance. Another type of data warehouse is a datamart. The structure of data stored in a datamart more closely resembles the operational data from which it was extracted. As the number of datamarts in an organization begins to climb, a decision must be made whether to continue with the less efficient structure of multiple datamarts or to move to an enterprisewide data warehouse.
Data warehouses are also not a solution that you can go out and buy. You build a data warehouse to support the strategic and informational requirements of your business. Building a data warehouse is a strategic investment. For businesses that store massive amounts of data and use it to make strategic decisions, this investment will pay for itself very quickly, often in under a year, because users will make better information-driven decisions.
Extricating Your Data
One of the AS/400’s major strengths is its database. Several recent enhancements make this great database even better at storing and processing large databases. Most business intelligence systems rely heavily on SQL. Therefore, any enhancement to SQL provides a corresponding benefit to these business intelligence systems. These new enhancements include support for a query options file, encoded vector indexes, stored procedures, user- defined data types (UDT), user-defined functions (UDF), and large objects.
SQL Can Do It!
One new feature that is particularly useful in environments that include a lot of ad hoc reporting or data warehousing is support for encoded vector indexes (EVIs). An EVI is not used directly by applications; the query optimizer uses it to improve query performance. EVIs contain statistical information about the key values contained in a column of data. This statistical information helps the optimizer determine the best way of processing a particular key. One way the optimizer uses EVIs is in the construction of dynamically built selection bitmaps. In some cases, the optimizer builds a bitmap containing one bit for each row indicating whether the row should be selected or not. Building dynamic selection bitmaps using the information contained in an EVI is much faster than reading each row and setting its selection bit, which is how bitmap indexes are built on other platforms or when an EVI is not available.
Another feature now available with OS/400 V4R4 is the query options file. This file contains records used to adjust how the query optimizer processes queries. This file centralizes the settings used by the query optimizer when it processes SQL statements. In the past, several commands, system values, and a data area supported these settings. Now that these settings are stored in a single file, they can be accessed and set using SQL, which is much easier. The file that contains these settings is QAQQINI. To find out how to use
this file to improve the performance of SQL statements, see my article, “Tuning V4R4’s Query Optimizer the Easy Way” (Midrange Computing, March 2000).
Stored procedures and user-defined functions can also be useful when extracting or loading data to a data warehouse as they provide a mechanism for SQL to call custom functions. These functions can be used to cleanse data, precalculate information, and simplify client/server data extraction. Many vendor solutions provide data cleansing and transformation support; however, in my opinion, it is better to support these functions through the database rather than relying on a vendor-specific solution. The advantage to supporting these functions with SQL at the point of extraction is that they can be shared by more than one process.
Finally, with V4R4, there is support for new data types in SQL. The system now supports several types of large objects such as binary large objects (BLOBs) and character large objects (CLOBs). These objects allow you to store and maintain binary objects such as images and long strings of text. Processing these types of data requires more work on your part because you must provide support for things such as scanning using user-defined functions. Another new feature is support for UDTs that allow you to further classify the data on your system. As with large objects, you supply the support for these data types with user-defined functions. The main advantages to user-defined data types are that they rigidly define how data is used and how that data relates to other data in the database.
The Decision Triumvirate
There are three main categories of decision-support tools. These three categories are ad hoc reporting, multidimensional analysis, and data-mining tools. Some of the tools available today provide support for more than one of these categories.
Ad Hoc Reporting
If you have not done so already, investing in an ad hoc reporting and inquiry tool can save you and your staff quite a bit of time. These tools allow you to define reports and inquiries without having to compile or modify programs. You usually set up a base definition that your users select and run after applying additional selection formatting.
There are several AS/400-based tools as well as PC-based reporting solutions. The advantage of an AS/400-based solution is that it is easier to integrate with your other AS/400 applications and allows report management to be centralized. The strength of PC- based reporting solutions is that they provide better formatting capabilities and their output is more easily integrated with your PC-based applications.
OLAP is one of the most valuable and flexible tools available for user data analysis. OLAP tools extract data from your data warehouse, or directly from operational data, and build summaries and indexes that allow you to quickly select and easily view information from different perspectives. There is also a second hybrid type of relational OLAP (ROLAP) tool that allows you to join relational information into your hypercubes, which reduces data duplication because common data can be shared between multiple hypercubes.
OLAP tools have two main components. The first component, which builds the hypercube, is the OLAP server. An OLAP server extracts data from your database and then summarizes the data for a set of specified keys called dimensions. Once a hypercube is built, it is ready to be used by the second component, the client viewer. Depending on the OLAP product, additional components may support things such as joining of hypercubes, preprocessing of data, or drilling down into the underlying operational data.
The client viewer is the OLAP component that will most interest your users, allowing them to drill down into the data in a hypercube while changing perspectives and applying selections. This is an extremely powerful and flexible way to analyze data. Most OLAP viewers allow users to view the data in a variety of formats, including spreadsheets,
graphs, maps, charts, and reports. As the user drills down into the data, he can also apply formulas and add columns, making these tools even more flexible.
Many OLAP products simply use Excel with pivot tables to view the OLAP data. The main drawbacks to this type of client support are the limitations imposed by Excel on the number of rows allowed and the complexity of the interface. A better solution is a viewer designed specifically to work with OLAP data. With a few clicks, the better viewers will allow you to drill down into information while selecting, grouping, pivoting, and applying functions.
Mining for Minutiae
For many years, computer scientists have tried to emulate the human capability to learn. Computers have not yet met that goal; however, they are getting better at learning by identifying unknown associations in a set of data. The results of data mining are similar to what may be achieved using statistical analysis, but the methodology differs. With data mining, you analyze your data hoping to develop a new hypothesis, whereas with statistical analysis, a hypothesis is proposed and then proven by analyzing the data. Data mining can identify anomalies that might not be apparent or are difficult to discern using traditional methods.
The process of data mining is an iterative process, requiring multiple passes to yield valid results. The first step is to select and prepare the data. Next, the data is analyzed using several techniques that include prediction, association discovery, sequential pattern discovery, and time sequence discovery. As possible hypotheses are identified, they need to be refined to determine whether the results returned are applicable in the real world. You refine the answers derived by data mining using statistical analysis, predictive modeling, or other forms of analysis. The refining process often uses additional and varied data to ensure that the original hypothesis applies to different sets of data. This process is adjusted and repeated to further enhance the results.
Often, finding a small anomaly in your business environment can yield a significant business opportunity. You may have wondered why supermarkets are using shopper bonus cards. These cards allow supermarkets to record individual sales and associate them with the shopper’s name and address. Because of the volume, it is impractical to analyze this data using traditional statistical analysis. Data mining is capable of analyzing this volume and producing valuable customer purchase information used to target promotions at specific customers. So, don’t be surprised if you’re in the checkout line sometime and the clerk hands you a coupon for a free pack of Twinkies with your next purchase of Mountain Dew.
UNIX: It Isn’t Just for Dweebs Anymore
As you may have surmised, many processes performed by business intelligence systems are computationally intensive. UNIX systems like the RS/6000 have always had an advantage over the AS/400 for these types of processes. Many businesses extract operational data and move it to another system used exclusively for decision support in order to off-load the decision-support work. Many AS/400 shops can avoid off-loading data by simply upgrading their current machine or running these systems during off-hours.
One recently announced AS/400 feature that addresses the UNIX advantage is the ability to natively run UNIX software directly in the AS/400 Portable Application Solutions Environment (AS/400 PASE). In the past, business intelligence developers had to port their UNIX applications to the AS/400 to make them run. Now vendors can run their applications in the PASE environment, which can be an advantage, especially for computer-intensive routines.
Building Flexible Apps
Metadata, which is data about data, makes it possible for disparate applications to share and manage data intelligently. Metadata describes the data stored in your applications and data warehouse and can even be used to describe your applications. By describing data in a common format, business intelligence solutions, which may come from several vendors, are able to access the same information. Several of IBM’s largest business intelligence solutions providers now support IBM’s DB2 Metadata program. Vendors are now able to integrate and combine their solutions, making it easier to share a common set of data.
Using metadata to describe applications can also give your company an advantage. In some cases, generated code from the metadata creates the application. More recently, the power available on the AS/400 allows a runtime environment to interpret the metadata directly. In my business, new applications are defined using metadata, which is then interpreted to create applications. Some coding is still required for things such as data validation edits, but the runtime environment supports the majority of an application. Using metadata to define applications allows us to share a lot more application code, allowing us to prototype and develop applications very rapidly.
Graphical tools that help you organize and manage your system and application development efforts are becoming more prevalent than in the past. For instance, IBM has added many new features to Operations Navigator, part of IBM’s Client Access/400, making this more useful for managing your system and applications. Software vendors are also starting to create plug-in components for Operations Navigator to extend its function.
Share the Knowledge
Once you begin collecting data and transforming it into information, you will need a way to share that information. If your AS/400 is already connected to a network or the Internet using TCP/IP, this part is easy because you have dozens of options. The business intelligence tools that include client viewers in many cases include a browser-based option. Emailing information from your AS/400 is another good way to make information available to your users and customers. Remember that email is not secure outside of your network.
One the easiest ways to make information available is to post it to a Web page. The AS/400’s Web server is easier than ever to set up and configure. Operations Navigator and the Web-based administration tools make it a lot easier to set up and administer the AS/400’s HTTP server. If you have had trouble configuring and administering the HTTP server in the past, you should try using the Operations Navigator for these tasks.
A report server can help you distribute reports much more efficiently. You can purchase or build a report server that will monitor your output queues and distribute reports via email, via FTP, or by posting the report onto a Web page. Report servers reduce the number of reports actually printed and make storing and locating reports easier.
Now You Need to Decide
It is time to consider using data warehousing and business intelligence to give your company a competitive advantage. Combining these technologies with the database strengths of the AS/400 can help your company to remain competitive by giving your users access to information they need to react and adapt swiftly when there are changes in the marketplace. Business intelligence systems help your users understand the consequences of change by allowing them to understand the effect of past decisions, which can help them to more accurately predict what the consequences of future decisions will be.
The AS/400 that you already have is an excellent platform to run your business intelligence solutions on. Along with legendary stability, the AS/400’s database is optimized to process large volumes of data efficiently. Analytical software that used to run only on UNIX machines can now run natively on the AS/400. This will improve the quality and availability of business intelligence software that runs on the AS/400.
References and Related Materials
• AS/400 PASE home page: www.as400.ibm.com/developer/factory/pase/index.html
• DB2 for AS/400 SQL Programming (SC41-5611-03, CD-ROM QB3AQ803)
• DB2 for AS/400 SQL Reference (SC41-5612-03, CD-ROM QB3AQ903)