Analytics breeds knowledge. Knowledge is POWER.
This article continues the series that temporarily replaces my "In the Wheelhouse" column. Through the rest of this year, I'll discuss what IBM has been investing its time on: Cloud, Analytics, Mobile, Social, and Security (CAMSS). I started by addressing strategy and then provided an overview.
Of the five CAMSS components, analytics has arguably taken first place in IBM's importance scale. If you don't believe me, take a look at the new cognitive-themed www.ibm.com and see if you can find content that isn't related to analytics in some capacity.
Joining me this week is IBM's Doug Mack, Analytics Consultant – DB2 for i Center of Excellence. Doug's team works as a part of Power Systems Lab Services, and we had a long chat about DB2, analytics, modernization, where IBM is going, and how it pertains to IBM i customers.
Steve Pitcher: Doug, can you talk to me a little about what your team does at IBM and how it supports the analytical component of CAMSS?
Doug Mack: We're about technology enablement, SQL performance assessments, education, database modernization planning, and strategy for customers. One of the reasons why would be to help open up and modernize DB2 for those analytical capabilities. We've been doing a lot with DB2 Web Query and our new Data Migrator tool. We architect and design infrastructures for analytics. That might be a data mart or a data warehouse, depending on what tools are actually deployed for analytics.
SP: I spoke with someone at COMMON in Anaheim this April, and it was very interesting that he said he just hired an "IBM i database administrator." There were six or seven people around the lunch table who just had their jaws on the floor. I mean, historically you've never needed one in an official capacity, but with DB2 growing in leaps in bounds in terms of capabilities, the concept of an IBM i DBA doesn't seem that farfetched. What are you seeing from a modernization perspective? Are many customers modernizing their databases and related skill sets at a higher degree as opposed to the front ends and business languages?
DM: Having that data-centric perspective is important whether you're going through a modernization perspective or not. One of the biggest reasons is performance. You want to make sure that you have a good idea of what your database is doing. A very basic and common example is if you've got a database with a record limit. You need to be knowing about those; otherwise, your application is going to stop in its tracks.
To take advantage of database skills can be very useful. Maybe you want to be taking business rules out of your high-level language code and putting them into the actual database. You can now build any application, anywhere, to get access to the database and be assured that those business rules are consistent and adhered to. One example of that would be row and column security that was added not too long ago. If you've got three regions in your company (Western, Central, Eastern) and you've got a Western region manager, if they want to get at their part of the database you can ensure that they can't see the regions they're not supposed to see. All of those regions are stored within the same table. That's row-level security. Column-level would be like a salary field in a table. If you're not a manager, then you should never have access to that column. When you go through the process of modernization, you can implement row and column security attributes at the database level rather than at the application level. By doing so, you can open up your database to any application yet be confident that your data is secure because it's managed at the database level. There are even more examples like with traditional RPG record-level access, one record at a time processing, to using SQL as your I/O access method to use a set-at-a-time approach. There's a lot of good reasons to do that. You have to manage performance with that as well to ensure you're getting the most out of the SQL processing.
SP: Given that we live in an age where security is so paramount in every solution, getting the business rules out of the language and into the database is a great thing from a data integrity point of view. Are you seeing a lot of customers looking for assistance in that type of modernization?
DM: Absolutely. We do a lot of education. [For] customers who want to get the value out of a modernization approach, we might be engaged to come in and figure out how best to get there with minimal impact to their applications and their environments. If you take the right approach, it can be a lot less painful than the thought of ripping and replacing your databases and applications. Every customer is different. We can identify areas to get a customer the best value. As you do that, the customer gets their feet wet in the process, so when things come up down the road they'll have the skills to do that themselves.
SP: You mentioned Data Migrator. Can you elaborate on that a little bit?
DM: What we're trying to do is provide more of a business intelligence and datawarehousing solution, first and foremost because that's where a lot of our customers are at. Secondly, from a database perspective, we're opening up a database so it can be a source for more of the advanced analytics that may not run on IBM i. From an IBM i standpoint, we've got a strategy for supporting business intelligence and data warehousing and providing tools to monitor it from a performance standpoint. From an advanced analytics standpoint, having appropriate interfaces in place with SQL processing, we want to provide DB2 for i as a solid data source.
Take an example. IBM has an analytics solution. We bought a company called SPSS. This product doesn't run on IBM i. It can sit on an AIX or Linux or Windows even to access DB2 for i.
Since 2007, we felt like we had a need to deliver a graphical business intelligence solution rather than relying on some of our partners to provide that. We wanted to build something specifically for IBM i that took advantage of some of the things we were doing in DB2 for i from a query acceleration standpoint.
I still haven't answered your question, but we'll get there.
So, basically the strategy has been to provide some business intelligence tooling on IBM i at low cost. That was part one, and that's called DB2 Web Query. What we really wanted to add to that was the ability to architect a solution that builds what I would call a reporting repository. An isolated, optimized database if you will to support the business intelligence queries. It may not be right for everybody due to cost or size of your databases, but for a lot of clients it makes a lot of sense to approach this from an architecture that says let's grab the data on a nightly basis, pull it out of my traditional operational application databases, and pump it over to another LPAR or another server and reshape that data to optimize it. What do I mean by that? Well, a simple example might be to imagine if you had a record that had 150 fields. Now 40 of those fields are for future use and blank. They're not even being used. Why even bring them over to the second server or LPAR? I can eliminate them and improve performance in business intelligence processing.
The other thing is reshaping the data from an analytics perspective. A good example would be a marketing application. Building a customer-centric data model is a better way to support that type of application rather than dig through 18 different tables in your operational databases to get that customer-centric perspective. I can reshape that data to get a new, relevant perspective that's more appropriate.
So what is Data Migrator? That's the tool we introduced to the DB2 Web Query family that automates that process. It runs in the background on schedule, goes and grabs data from operational databases or journal receivers. It can pick up changed data from the journal receivers then update the target reporting repository. We can get into semantics whether that's a data warehouse or a datamart or an operational datastore, but essentially it's an isolated, optimized reporting repository. With these products and features, a lot of the CAMSS principals are there to help plan for the future.
While I have you, one things that doesn't get a lot of airplay are the things that the DB2 for i development team have been doing to support the things that would occur in an analytics application.
SP: Please go on.
DM: Well, some of the Encoded Vector Index enhancements that we've made. EVIs are built into the database, and they've been around for a long time, but we've enhanced them to be able to store aggregate data in the index. A lot of analytics queries are looking at data from an aggregation or summary level to spot trends and drill down on it. The idea of storing a precalculated aggregate in the index just works. If I can answer the question about what the sales numbers have been in the last quarter by pulling that out of the index, then that's going to be extremely fast versus going against detail-level data and having to calculate that at the time the report is done. That's one example. Another one is an example to EVI that should be delivered in another couple weeks. We can use a technique to make a query run faster without having to plow through very large numbers of records in underlying tables.
People talk about in-memory databases. We've always had the concept of single-level store. We've added additional ways of taking specific tables and indexes and pinning them to memory. I want to take business intelligence and get the best performance on it.
SP: One last question. With performance in mind, I recently rediscovered Index Advisor...
DM: That's a great example of where the DB2 for i tooling is critical to help manage and monitor performance in an application. IA and Visual Explain and being able to analyze SQL plan cache, all of these are great tools that are shipped with the database. Our database development team has built those tools, and they're shipped with the GUI interface through Navigator. They're critical to monitoring and managing performance aspects of your applications.
With regards to Index Advisor, some companies do everything that it tells them. It's a strategy, but it's certainly not for everyone and we wouldn't necessarily recommend it. This is where that database expertise comes into play. You need to be able to look at the database within the context of the application and make decisions based on the advice given to you by Index Advisor. Some things make sense to implement and others may not because of the nature of the application. Let's say you have two index recommendations on the same table. One is to index columns A and B. The second is to index columns A, B, and C. You wouldn't build the first index because the second index would cover it. There's a bit of art and science involved to use the information, but the information is really good.