Able to process hundreds of thousands of structured or unstructured data streams in real time, stream technology is expected to take analytics and decision-making to new heights.
IBM introduced a new software solution to investors and analysts last week that is the result of five years of research and some 200 patents. Called "stream computing," the solution captures and analyzes thousands of streams of data simultaneously in real time and then delivers them to business and/or scientific analysts.
The technology moves business intelligence into the era of Star Trek and already has found a ready audience among European businesses that want to apply the new technology to a short list of challenging business problems, according to IBM.
IBM has dubbed the groundbreaking technology "System S" and is making trial code available to help users better understand the software's capabilities and learn how to take advantage of it in their businesses. The trial code includes developer tools, adapters, and software to test applications. The company also announced it is opening the IBM European Stream Computing Center with headquarters in Dublin, Ireland, that will serve as a hub for research, customer support, and advanced testing.
IBM described System S as having its own unique architecture that uses streaming data and advanced mathematical algorithms to create forward-looking analyses from a variety of data sources. The solution doesn't use just numbers; it can analyze unstructured and incompatible data sources captured from electronic sensors, Web pages, email, blogs, and video.
While traditional computing models analyze stored data and typically can't continuously process huge amounts of incoming data on the fly, System S is designed so that users can become what IBM calls "real-world aware" and therefore able to see and respond to changes across complex systems.
Part of the InfoSphere product line, System S operates for now on Linux and is designed to run on a variety of hardware platforms. It already is in use by several organizations, including the Swedish Institute of Space Physics and Uppsala University to study weather, the Marine Institute of Ireland to study marine ecosystems, and TD Securities to develop an automated options trading system. IBM and the University of Ontario Institute of Technology are testing System S to track subtle changes in critically ill premature babies. IBM foresees the solution being useful in the fields of radio astronomy, energy trading services, financial services, health monitoring, and manufacturing.
The potential of the technology is, in IBM's estimate, enormous. To be able to use computers to rapidly analyze multiple streams of diverse unstructured data in real time to assist decision-making is a capability the human race has not heretofore possessed. The solution allows users to create forward-looking analyses of data from any source and then narrow down precisely what someone is looking for, continuously refining the answer as additional data becomes available.
The system can analyze literally hundreds of thousands of simultaneous data streams, whether they are stock prices, retail sales reports, weather reports--even such things as video news in which the system extracts photo captions and performs speech recognition from a live broadcast. It can transform the data, annotate it, filter it, classify it, and even spit out the requested decision. One can only wonder what the implications of such a system might be on the military and defense establishments. IBM acknowledges that the U.S. Government has been working with it on the project since 2003 to solve the problem of analyzing large streams of data.
Users operate on streaming data in a variety of ways. The Stream Processing Application Declarative Engine (SPADE) provides a language and runtime framework to support streaming applications. Users can create applications without understanding the lower-level, stream-specific operations. SPADE provides built-in operations, the ability to bring streams from outside, and export results. It also allows a way to extend the underlying system with user-defined operators.
A user can pose inquiries to the system that express their needs and interests a la Star Trek. The inquiries are translated by a "semantic solver" into a specification of how raw data can be transformed to meet the requirement. "The runtime environment accepts these specifications, considers the library of available application components, and assembles a job specification to run the required set of components," according to an IBM white paper on the subject.
Users also can develop stream applications through an Eclipse-based Workflow Development Tool Environment that has an IDE. Users program low-level application components that are interconnected via streams. IBM is working on the development model that it says will evolve over time to operate directly on SPADE operators rather than the low-level applications components.
The next decade is expected to see data volumes double every two years while the economic slowdown is causing organizations to be more innovative in their decisions about how to use the data they have. To gain a competitive advantage, organizations will be looking to real-time decision-making as a way to improve bottom-line results. IBM believes that, while businesses have seen a good return from investing in business process automation over the years, the next wave of efficiencies will come from better optimization of the business decision making process.
Businesses must be more predictive and anticipate and recognize what is happening in the marketplace and make accurate and quick decisions accordingly, according to Steve Mills, senior vice president of IBM Software, who briefed investors. Businesses want to protect themselves against risk and find new opportunities, he said. To do that, they will need to process huge amounts of data and be able to perform real-time analytics, Mills said.
Stream computing represents a new paradigm. Instead of running queries against a relatively static set of data, a person will be able to execute what, in effect, is a continuous query with results being updated as the information comes in. This technology will not only represent opportunities for businesses but clearly will open doors for IBM and its Business Partners.