The Database Group @ University of Toronto  The Clio Project
     Project Description | People | Publications | Talks | Test Schemas | Demo 
         
 
 
  Project Description
 
  The world today is full of information sources, all with their own ways of representing data. One common problem that arises is that data, which exists in one representation in some data source, is needed in a different representation for some other purpose. As a simple example, the owner of a data source may want to publish her data using a specific XML DTD, though it is stored in some different (legacy) format. As another example, data warehouses bring data from one or more sources together, in a new form that allows for efficient decision support queries. Today, such situations are for the most part dealt with manually, by an expert user who has knowledge of both the source and target representations. Converting from one data representation to another is a time-consuming and labor intensive project, with few tools available to ease the task.
 
 The Clio project is a joint project between the IBM Almaden Research Center and the University of Toronto begun in 1999. Clio's goal is to radically simplify information integration, by providing tools that help in automating and managing one challenging piece of that problem: the conversion of data between representations. Clio pioneered the use of schema mappings, specifications that describe the relationship between data in two heterogeneous schemas. From this high-level, non-procedural representation, Clio can automatically generate either a view, to reformulate queries against one schema into queries on another for data integration, or code, to transform data from one representation to the other for data exchange.
 
 Supported by an IBM University Partnership Award, a National Science Foundation CAREER award, a Presidential Early Career Award for Scientists and Engineers (PECASE), and NSERC.
 
  People
 
 
  Selected Publications
 
  • Clio: Schema Mapping Creation and Data Exchange
    New in 2009: retrospective on the Clio project.
    Ron Fagin, Laura Haas, Mauricio Hernández, Renée J. Miller, Lucian Popa, Yannis Velegrakis.
    To appear in book Conceptual Modeling: Foundations and Applications, editors Alexander Borgida, Vinay Chaudhri, Paolo Giorgini and Eric Yu, Springer 2009.
  • Creating Nested Mappings with Clio (Demonstration)
    Mauricio Hernández, Howard Ho, Lucian Popa, Ariel Fuxman, Renée J. Miller, Takeshi Fukuda and Paolo Papotti.
    In Proceedings of the International Conference on Data Engineering (ICDE), 2007.
  • Nested Mappings: Schema Mapping Reloaded
    Ariel Fuxman, Mauricio Hernández, Howard Ho, Renée J. Miller, Paolo Papotti and Lucian Popa.
    In Proceedings of the International Conference on Very Large Data Bases (VLDB), 2006.
  • Data Exchange: Semantics and Query Answering
    Ron Fagin, Phokion Kolaitis, Renée J. Miller and Lucian Popa
    In Proceedings of the International Conference on Database Theory (ICDT), 2003.
  • Translating Web Data
    Lucian Popa, Yannis Velegrakis, Mauricio Hernández, Renée J. Miller and Ron Fagin.
    In Proceedings of the 28th International Conference for Very Large Databases (VLDB), 2002.
  • Translating Web Data
    Lucian Popa, Yannis Velegrakis, Renée J. Miller, Mauricio Hernández and Ron Fagin
    Technical Report CSRI 441, University of Toronto, 2002.
  • Mapping XML and Relational Schemas with CLIO
    Lucian Popa, Mauricio A. Hernández, Yannis Velegrakis and Renée J. Miller
    System Demonstration, IEEE Data Engineering Conference, 2002.
  • Data-Driven Understanding and Refinement of Schema Mappings
    Ling-Ling Yan, Renée J. Miller, Laura M. Haas and Ron Fagin.
    In Proceedings of the ACM SIGMOD International Conference, 2001.
  • Clio: A Semi-Automatic Tool For Schema Mapping.
    Mauricio Hernández, Renée J. Miller and Laura M. Haas.
    System Demonstration, ACM SIGMOD International Conference, 2001.
  • The Clio Project: Managing Heterogeneity
    Renée J. Miller, Mauricio Hernández, Laura M. Haas, Ling-Ling Yan, C. T. Howard Ho, Ron Fagin and Lucian Popa.
    In SIGMOD Record, 2001, 30(1): 78-83.
  • Schema Mapping as Query Discovery
    Renée J. Miller, Laura M. Haas and Mauricio Hernández.
    In Proceedings of the International Conference on Very Large Databases (VLDB), 2000, 77-88.
  • Transforming Heterogeneous Data with Database Middleware: Beyond Integration
    Laura M. Haas, Renée J. Miller, B. Niswonger, Mary Tork Roth, Peter M. Schwarz and Edward L. Wimmers.
    IEEE Data Engineering Bulletin 1999, 22(1):31-36.
  • Using Schematically Heterogeneous Structures
    Renée J. Miller.
    In Proceedings of the ACM SIGMOD International Conference on the Management of Data, 1998, 27(2):189-200.
 
 
  Test Schemas
 Here is a list of some of the schemas that have been used to test Clio. We are making them publically available to help in comparing schema integration and schema mapping solutions. Nested schemas are presented in XML-Schema. Relational schemas are given as DB2 DDL statements and/or XML-Schemas for convenience.
  
 
Test Data Source Schema 1 Schema 2
Financial (Expense/Statistics DB) Nested ( XML-Schema ) Nested ( XML-Schema )
DBLP Nested ( XML-Schema ) Nested ( XML-Schema )
TPC-H Relational ( XML-Schema ) Nested ( XML-Schema )
GeneX Relational ( XML-Schema ) Nested ( XML-Schema )
Mondial Relational ( DB2 ) Nested ( XML-Schema )
Amalgam Relational ( DB2, XML-Schema ) Relational ( DB2, XML-Schema )
 
  Clio@Almaden
 A few screen shots are included on IBM Almaden's site (here).
For a demo or information on code availability please contact Howard Ho (lastname @ almaden.ibm.com).
 
Copyright © 2006 The Database Group @ University of Toronto | Last Updated July 10, 2007