We’ve been talking about the tools and apps you’ll probably need to get AI up and running. Now, let’s come back to Watson and its product line.
We talk about Watson as if it’s a single entity, but in reality the Watson line is a conglomeration of products. At its heart, of course, is the server, the parallel array that provides the kind of horsepower required to churn through the many iterations required to properly train an AI model on a large set of data. And there’s also the special language that allows the Watson models to understand both structured and unstructured data. But beyond that are a number of other packages that help simplify and automate the process.
You know, of course, that the three main activities that have to be done in an AI project are first to assemble the data, second to develop a model that approximates the problem you are trying to solve, and third to bring the data and the model together to train the model and develop it to the point where you are getting accurate predictions.
Once that’s done, you still have two more issues to deal with—namely, moving your test system to production and properly scaling it to handle the real-world data level.
Over the past few months, we have looked at a some open-source options that you can use to build and train models (June, July, August). But not everyone wants a roll-your-own approach, where you handle the integration between the various tools you’re using.
Fortunately, for those who are looking for a more integrated approach, IBM has a number of products in the Watson family that can be used to get your AI app up and running. And the
first one we want to look at is Watson Studio.
Watson Studio is a single, integrated environment that includes both open-source and pure IBM software. It also comes complete with an active user community, sharable projects to get you started, and the usual comprehensive (that is, there’s a lot of it) IBM documentation.
The first thing you can do with Watson Studio is to find and use the public datasets that come packaged in this application. There are over 300 of them, and they range across topics like population, wages, jobs, weather, and many other categories. True, this is not data that’s specific to your company, but if your model is looking at factors outside of your normal company span, you might be able to start the training process with one of these datasets.
You have the ability to view the data and bookmark it for inclusion into a specific project. The set can then be downloaded and added to the Data Assets part of the project. Once a dataset is selected, you can then create a notebook that contains this data and defines the language that will be used to process it in your training of the project.
Manage Object Storage
Or maybe you are not using the public datasets that we talked about above. Perhaps you are using your own data or public data that is not part of Watson. Fortunately, it is easy to access these data sources using the Machine Studio Object Storage function.
When you create a project in Watson, it automatically creates (provisions, to use the proper term) an object storage instance in the cloud. The storage area is broken down into buckets. If you look at those buckets via the Studio workbench, you will see that a Jupyter notebook named for your project has been created. All of the data associated with your project will be listed there. You can add, delete, view, see the SQL URL for the object to access it, and download your data from there.
You can create additional buckets (the name must be unique within the IBM universe) and set the resiliency and performance settings. Storage class will define the cost based on how frequently you want to access it.
Access to the buckets is controlled by Identify Access Management (IAM) policies that you can create within Studio.
To anyone who has watched as many World War II movies as I have, the word “collaborator” has an unpleasant sound. But in the world of AI, it is a very necessary and vital part of the project. Even though a technical person may be doing most of the detail work, many other people will be interfacing with the project and need access to data and other resources. This could be a security nightmare, but fortunately Studio handles it for you.
There are three levels that can be assigned to each collaborator. Viewers are only allowed to view a project. Editors can control project resources. Admins control project resources, the collaborators, and the project settings. In addition, an admin can create service IDs, which are used to allow applications outside of the IBM cloud to access the project. One nice thing is that although an admin creates the service IDs, the IDs are not tied to that admin’s profile so that as people come and go in the project, the service IDs remain.
Publish Project Notebooks
You also get integration with GitHub so that you can easily publish project notebooks either to your repository or as a “gist” (a GitHub thing that allows you to store information without having to explicitly create a repository).
You have to turn on GitHub integration in settings, and then every time you want to publish something, you need to create a “token” within Studio and provide the GitHub URL you want to publish to. But it can all be done within Studio with no additional fooling around.
Studio will automatically track changes and other events that occur to both data and models throughout the life of the project. This could be as simple as adding a tag to the asset or as monumental as downloading it. What is tracked is when the event happened and who was involved.
If you use either the public datasets or the Managed Object Storage, then in both cases you have data that exists on the IBM Cloud. But Studio also allows you to access and use data that was, and remains, external to the cloud where your AI app is running.
The ability to create a connection allows you to connect to three different types of data. The first is data within the IBM Cloud. Second is other IBM data sources that are outside of the cloud. One very obvious example here is DB2 data stored on your IBM i. The third involves data sources that are external to the IBM world (for example, an Oracle database). With this capability, you can easily access almost any source of data that you need for your project.
In the End
As you can see, IBM Watson Studio provides a wide range of tools that allow you to approach and control your AI project. And while you pay for these tools, the advantage they bring is the level of integration provided.
Next month, we will take a more in-depth look at this product.