Whatever the reason, the sharing of information among these disparate applications and unconnected data stores is often difficult, generally cumbersome and sometimes jury-rigged, if it is done at all. This is a problem. Easy access to integrated information is typically a requirement for productive operations and effective business analysis. What's more, the redundant manual entry of data into multiple databases is expensive and increases the potential for error.
These are the kinds of issues that Vision replicate1, from Vision Solutions, Inc., is designed to address.
Vision replicate1 falls into the class of software known as data replication infrastructure. However, the word "replication" may be misleading. The definition of replicate is to make an exact copy, but that is not always what Vision replicate1 is called upon to do—although it can do that as well. For one thing, the source and target database(s) may use different DBMSs. What's more, before sending it off to the other database(s), Vision replicate1 can transform the captured data, for example, to accommodate differing data types on the target database or to address differing user requirements, such as changing product names to accommodate regional differences. Thus, the transmitted data that is loaded into the target database is a replica of the underlying meaning of the source data but not necessarily of its form. For simplicity, the remainder of this article will use the word "replication" to refer to this copying of data content, regardless of the end form.
Vision replicate1 can perform its replication functions without the need for any programming on your part unless you want to apply custom-built data transformations.
The above discussion provides a high-level view of Vision replicate1, but a more detailed look at its features and functions is necessary to understand the value that your organization can derive from it.
Vision replicate1, which is written in Java, is very portable. It can run under a variety of operating systems, including Windows NT, XP, 2000, and 2003; HP-UX; Solaris; AIX; Linux; OS/400; i5/OS; OS/390; and z/OS. And it can move data from and to Microsoft SQL Server, Oracle, DB2, UDB, and Sybase, with support for virtually all of the data types available with each of those databases.
To be clear, Vision replicator1 doesn't run only homogeneously on the single platform of your choice, although it can do that. The point of this discussion is that it is capable of heterogeneous operations, i.e., copying data from, say, a DB2 sales database running on a System i server to an Oracle accounting database running under Solaris.
In addition, the database schemas do not have to be the same on the source and target. For instance, the product_number column in the catalog table in the source database can be mapped to the prodno column in the products table on the target. Or a single table can be split, with some of its columns being mapped to one target table and some to another. Likewise, provided they contain appropriate keys, tables can be joined, with the columns from two source tables being mapped into a single target table.
Vision replicate1 can move data between servers in both directions simultaneously. As it does so, it checks for conflicts that arise when the same data is modified simultaneously in different locations. When it detects a data collision, Vision replicate1 will, depending on the parameters you set when you configure it, either automatically resolve the conflict using a rule you've specified or flag the conflict for manual resolution.
Unidirectional and bidirectional replication is supported for all topologies, including one-to-one, one-to-many, many-to-one, many-to-many, and cascade. Using a cascade topology, replication proceeds through a chain of servers. This might be used, for example, by a U.S.-based global company that wants to reduce transmission costs by setting up a U.S. hub and replicating data from there to a European hub. The European hub can then assume responsibility for distributing the data to various European branch offices, rather than having the U.S. hub transmit to all of them individually.
Data partitioning—a feature that the industry also calls data selection, data filtering, or conditional data distribution—allows you to restrict replication to specific rows and columns. For example, the company telephone directory might be fed from a human resources database by replicating only the name, department, and phone number columns. Or consider a large retailer that maintains its enterprise-wide product catalog centrally, but not all products are sold in all regions. It could reduce catalog access bandwidth requirements by replicating just the relevant rows to local regional catalogs in near real-time, without the need for operator intervention.
Complex Boolean expressions can be used to define the row selection criteria.
Not all people need data in the same way. Product databases in the U.S. may include measurements in pounds and inches, but the company's European product database may use kilograms and centimeters. Product names may be in English in England, but in French in France. Some of a company's offices may want to store dates in mmddyyyy format, but offices in other countries may want them stored in ddmmyyyy or yyyymmdd format.
Vision replicate1 can handle these differences by applying transformations to data before it is copied to the target database. Included are more than 40 pre-built transformations that can be applied without the need for any programming. You can also develop your own transformations using Java.
Column mapping allows columns to be split, merged, or derived before being sent to the target database. Column splitting can be used to, say, split telephone numbers into area codes and local numbers so the area code can be used as a geographic filter. Column merging might be used to, for example, merge separate first_name and last_name columns into a single name column. The derived columns feature can be used to perform tasks such as using ZIP codes or telephone area codes to create a sales district column on the target table when it doesn't exist on the source.
Vision replicate1 is designed to guarantee that data will be delivered to and loaded onto the target database. When it detects that a change has been made to the set of data selected for replication, it captures that change and stores it in a queue on the source machine. The data is not deleted from the queue until the target confirms that the data has been received and applied to the database. If the target server or the network connection to it is down, Vision replicate1 holds the data updates in its queue until the target becomes available again.
Extensive Monitoring and Control
Three tools are available to monitor and control the replication processes:
- OmniMonCon, a graphical user interface
- OmniConsole, a command-line application
- WebMonCon, a Web-based interface
You can use these tools to start and stop replication processes and journal managers, view the status of the change capture processes, monitor replication queues, display logs, and more.
Data Movement Methods
Vision replicate1 provides a choice of data movement methods. Replication mode captures changes as they happen on the source database and, typically, transmits them immediately to the target, although you can also schedule the apply process to occur at a specific time or periodically. Because only changes are transmitted, this method is suitable for facilitating near real-time synchronization of the source and target databases.
Copy mode automates the capture, distribution, and application of snapshots of tables or subsets of tables. It can apply the snapshot as a complete refresh, append it to existing data, or merge incoming data with existing rows.
Synchronization mode combines the replication and copy modes. When you select this mode, Vision replicate1 initiates a snapshot copy. While that is being processed, the software captures changes made on the source database without applying them to the target. When the snapshot copy process completes, you can initiate the replication apply process to load all of the captured changes into the source database.
Data distribution and sharing is defined and controlled through a model that is stored in a metadata repository that defines data sources, data targets, selection criteria, paths the data should follow, and the transformations (if any) that should be applied.
There are two ways to build and maintain the model. You can use Replication Schema Language (RSL) to define all of the parameters of the model manually. Alternatively, Vision provides a Windows-based GUI, OmniDirector, that you can use to define the model. Everything that you can do using RSL, you can also do by pointing and clicking in OmniDirector.
In order for OmniDirector to provide its full functionality, it must be able to connect to all source and target systems and it must be granted authority to query the participating databases so that it can build lists of available tables and columns. Once those lists are built, you then select target and source databases, tables, columns, and paths from the lists to create the model.
When you finish building the model, OmniDirector validates it for completeness and accuracy. The model is then committed as production metadata on each data replication source server. The model repository at the source is stored in the native DBMS of the server and contains only the metadata required at that server, not the entire model. Because every replication source server has its own copy of the necessary metadata, even if any one server fails or is taken offline, the remainder can continue their replication processes, with changes stored in queues if necessary. Thus, there is no single point of failure that can stop replication from functioning servers.
RSL is particularly useful when it comes time to migrate your replication model. You can move a model from development or test simply by exporting the RSL, changing the server names, importing the updated RSL, and committing it. This eliminates the need to re-create in production what was already created in test, thus avoiding extra work and the potential for human error whenever information is rekeyed.
Case in Point
Vision replicate1 is often used to consolidate data from various operational systems into a data warehouse. One organization that is using it for this purpose is Florida's Broward County Public Schools, the sixth largest school district in the U.S. Broward County Public Schools serves approximately 265,000 students in more than 260 schools and education centers. It employs approximately 41,900 people.
The school district makes extensive use of the data at its disposal, including attendance records, grades, test scores, medical histories, and contact information for parents and guardians. Broward uses Vision replicate1 to copy data from a variety of sources into its data warehouse. More than 200,000 students, parents, teachers, and school administrators have been given login IDs that grant them access to portions of the data warehouse as appropriate.
A key to success in the school district's business intelligence efforts is the timeliness of the data in the warehouse. Vision replicate1 helps to ensure that timeliness. "About 85 percent of our users are on Macs, and the other 15 percent on PCs," explained Phyllis Chasser, Ph.D., senior data warehouse analyst, Broward County Public Schools. "All of the student transactional applications as well as our data warehousing activities run on IBM System i. With [Vision] replicate1, we can bring data over rapidly from a wide range of data sources, including SQL tables, DB2 tables, flat files, SAP, Oracle, and others—often with just a point, click, and drag."
Easier access to more timely data has allowed the school district to enhance collaboration among teachers and administrators and improve decision-making processes. "A good example is the fact that our district demographics change on a regular basis; we have a lot of students moving from school to school during the year," said Chasser. "We used to update all of our records overnight, but now, some of the most essential data, such as student registration and attendance, is real-time. So, by the time the student reports to the new school, everything is already there, and if they move during the middle of today, the records are there immediately. Students can be placed right away in the specific classes they need. If they have problems with English or they need to be on the lunch program, there's no delay."
Broward County School District's innovative use of data warehousing has captured attention. The school district is one of five finalists in the "Creating an Agile BI Infrastructure" category of Business Intelligence Perspective's "Best Practices" Awards. At time of writing, the top honoree had not yet been announced.
Vision Solutions, Inc.
17911 Von Karman Ave.
Irvine, CA 92614
MC Press Online