Using IBM Watson Analytics for extending an enterprise's ability to understand its own data doesn't happen automatically. Datawatch, an ISV that offers software for facilitating Watson Analytics' use, explains how it can be done with its Monarch product.
The traditional strength of the IBM i has been the superior ability to handle data and databases and to report the information contained in them to those who need it. A myriad of third-party business intelligence and reporting products offered to support the platform have enhanced this ability even more. However, although IBM hasn't done the best job of pointing it out, IBM Watson Analytics may be able to help SMB enterprises do an even better job of analyzing their own internal data.
Datawatch produces a product called Monarch, which offers self-service data preparation for analytics. It operates on PCs running Windows 7 SP1 or later that have 4 GB of RAM and at least 600 MB of disk space. Monarch can scale to thousands of users. Providing some insight into how Datawatch Monarch can prepare data for use with, among other outputs, IBM's Watson Analytics, is Jon Pilkington, chief products officer at Datawatch.
The Limitations of DB2 and SQL
DB2, SQL, and third-party tools have provided data analysis to IBM i users for decades. Nevertheless, DB2 and SQL have their limitations.
"While DB2 and SQL offer the ability to perform data analytics, it requires business users and analysts to be familiar with database operations and SQL queries," Pilkington notes. "Another limitation of using DB2 and SQL is that business users cannot join it with semi-structured data without first converting it to a structured data format and then uploading it to the database. This can be a time-consuming process, and it is not repeatable. Individuals need the ability to quickly parse data from semi-structured documents, join it with the structured data, even enrich it, and cleanse and prepare the data.
Typically, users know what the end state should be or have a specific use case that analytics will support. The problem is that most business users face the inability to access data from disparate sources quickly and then be able [to] prepare, blend, and enrich the data in a set timeframe. Additionally, many business users find that they may need to perform this task multiple times."
Monarch as One Means of Exporting Data to Watson Analytics
"Third-party solutions, such as Datawatch Monarch, enable the sharing of specific database tables so users can perform analysis or join data without the need for writing specific SQL queries. Now, anyone that needs to access data can do so without asking IT to generate reports for each department's specific request. Monarch lets an administrator share definite tables from a database within departments without granting access to the entire database. Once the user has access to the table(s) from the database, the user can then perform joins without the need of writing SQL queries. This enables information workers to leverage self-service analytics and perform data preparation without SQL expertise or complex scripting."
Monarch outputs can be fed to a variety of data export targets, including Access, Angoss, CSV, Datawatch Designer, Excel, Qlik, and Tableau. For our purposes here, the significant targets are IBM Watson and Cognos Analytics.
"Watson Analytics is a great service for analyzing data," Pilkington continues. "However, most users need to first cleanse, enrich, and join multiple (sometimes semi-structured) datasets prior to uploading into Watson Analytics. We work with IBM to directly integrate Watson Analytics into Datawatch Monarch. Not only is Monarch the recommended data preparation tool for Watson Analytics, we have connectors built into the product that allow for seamless export of data directly into Watson Analytics. Additionally, IBM resells a version of the solution as 'Monarch for IBM Analytics.' The combination of Datawatch Monarch and Watson Analytics enables business users to discover new insights and ask more complex questions. It also lets business leaders make decisions more quickly with access to all of the data."
Setting Up Data for Using Watson Analytics in the Cloud
So how does an enterprise prepare its data for exporting to Watson Analytics via the cloud? Pilkington explains.
"Datawatch recommends users set up a workspace that contains the instructions of how to prepare, blend and enrich the data in Monarch. This workspace lets users upload prepared data into Watson Analytics and creates a repeatable process that reduces data preparation time. Datawatch Monarch enables users to prepare the data for Watson Analytics through a repeatable process by simply applying a template to new files. This means an individual doesn't have to spend time performing the same data preparation operations multiple times."
"Many users perform repetitive tasks, and they need a quick solution to prepare and analyze the data from disparate sources," Pilkington elaborates. "Datawatch Monarch enables these users to create workspaces that prepare data and load it into Watson Analytics for analysis. Monarch has many standard user functions predefined, so all users need to do is right-click and select the appropriate option. There's no requirement for the user to write code or scripts. Once the user creates this workspace, it can be used in the future without the need to prepare the data again. Users simply point to the new dataset(s) and the results are ready to be loaded into Watson Analytics."
Another problem with data analysis can be a lack of client skills at configuring, testing, and managing analytics tools and environments. Monarch compensates for that by making its functions as simple as possible.
"Datawatch Monarch is designed for users at all levels from the standard business user to developers and IT personnel," Pilkington emphasizes. "Most of the tasks can be accomplished by dragging and dropping files and right-clicking to select the desired action. Monarch also makes joining datasets easy with the Join Analysis feature. This eliminates the need to write any SQL queries or code. Business users simply select an option and the fields they want to appear in the new table. Users even have the ability to save database connections in the Library so they don't need to create or remember complex connection strings, user credentials, and so forth, to access authorized databases. However, Monarch doesn't bypass any predefined security settings, so users can only gain access to specific databases."
One feature of IBM Watson Analytics is its ability to draw information from unstructured data. Monarch is designed to help with that process.
"Monarch is able to take semi-structured and unstructured data and convert it to a structured format that can be used to blend together with other datasets or analyzed as a single structured dataset," Pilkington says.
Verifying Data Quality
Even back when I was a young lad learning my Fortran 77, "garbage in, garbage out" was a familiar mantra. Similarly today, data analysis can only be as good as the inputs.
"There are several ways to verify the quality of the data with Datawatch Monarch," Pilkington reports. "Business users can access a Report Verify option that goes through an entire document and verifies the correct fields have been parsed from it. If you create a field that only captures partial data, the Report Verify will identify that and provide you with notification to adjust the field.
Alternatively, individuals have the ability to convert data types with Monarch. Many times, they need data in a specific format or a certain data type in order to conduct analytics. In addition, users can perform calculations, grouping, summaries and even negative joins (for reconciliation purposes). In fact, many use Monarch to verify reports and identify discrepancies. The summaries let users compare individual items to a summary or grand total to ensure account balances are correct and to verify the template used to parse semi-structured documents."
Coping with Data Governance and Data Silos
Data governance issues can come up when one department of an enterprise asks questions that result in answers more properly handled by another department. Pilkington shares how Monarch can handle that problem.
"With Datawatch Monarch, business users can create workspaces and datasets to share with specific individuals or departments. So, if Department A asks a question that falls into Department B’s area of operations, Department B can share the appropriate curated datasets and workspaces with specific individuals in Department A. On the other side of things, if Department A prepares a dataset and identifies information of value to Department B, Department A can share the workspace or datasets with Department B to enable Department B to further analyze and provide an answer."
Similarly, issues can arise when departments or divisions are so possessive of their data that boundaries create "data silos." Monarch also has an answer for that.
"Because Datawatch Monarch can pull datasets from disparate sources and blend them all together in one workspace, the business user can have different datasets stored in various locations and use Monarch to access, filter, join and enrich the data," Pilkington related. "Using Monarch, individuals can even automate this process by monitoring specified directories, and beginning a process on a fixed schedule or anytime a new document appears. This will eliminate the need for the user to continuously check multiple directories for updated or added files."
Watson Analytics and the IBM i
Datawatch's Monarch is a useful example of how PC-based software can function as an intermediary between IBM i databases and IBM Watson Analytics via the cloud. Without passing judgement on the relative merits of Monarch compared to any of its competitors, Monarch surely shows that there are ways and means available in the marketplace that make the Watson Analytics app accessible, and therefore of potential value, to IBM i SMBs today rather than years from now.