  Project Description
  The world today is full of information sources, all with their own ways of representing data. One common problem that arises is that data, which exists in one representation in some data source, is needed in a different representation for some other purpose. As a simple example, the owner of a data source may want to publish her data using a specific XML DTD, though it is stored in some different (legacy) format. As another example, data warehouses bring data from one or more sources together, in a new form that allows for efficient decision support queries. Today, such situations are for the most part dealt with manually, by an expert user who has knowledge of both the source and target representations. Converting from one data representation to another is a time-consuming and labor intensive project, with few tools available to ease the task.
 The Clio project is a joint project between the IBM Almaden Research Center and the University of Toronto begun in 1999. Clio's goal is to radically simplify information integration, by providing tools that help in automating and managing one challenging piece of that problem: the conversion of data between representations. Clio pioneered the use of schema mappings, specifications that describe the relationship between data in two heterogeneous schemas. From this high-level, non-procedural representation, Clio can automatically generate either a view, to reformulate queries against one schema into queries on another for data integration, or code, to transform data from one representation to the other for data exchange.
 Supported by an IBM University Partnership Award, a National Science Foundation CAREER award, a Presidential Early Career Award for Scientists and Engineers (PECASE), and NSERC.
  Test Schemas
 Here is a list of some of the schemas that have been used to test Clio. We are making them publically available to help in comparing schema integration and schema mapping solutions. Nested schemas are presented in XML-Schema. Relational schemas are given as DB2 DDL statements and/or XML-Schemas for convenience.
Test Data Source Schema 1 Schema 2
Financial (Expense/Statistics DB) Nested ( XML-Schema ) Nested ( XML-Schema )
DBLP Nested ( XML-Schema ) Nested ( XML-Schema )
TPC-H Relational ( XML-Schema ) Nested ( XML-Schema )
GeneX Relational ( XML-Schema ) Nested ( XML-Schema )
Mondial Relational ( DB2 ) Nested ( XML-Schema )
Amalgam Relational ( DB2, XML-Schema ) Relational ( DB2, XML-Schema )
 A few screen shots are included on IBM Almaden's site (here).
For a demo or information on code availability please contact Howard Ho
