The Database Group @ University of Toronto The xCurator Project
  xCurator is now open-source, but with limited support and documentation.
See the latest release and the Web interface code on github.
   Project Description
   Semistructured data is abundant on the Web. Many Web data sources and APIs make their data available in XML, JSON, or a domain-specific semistructured format, with the goal of making the data easily accessible and usable by Web application developers. Although such data formats are more machine-processable than pure text documents, managing and analyzing such data in large scale is often nontrivial. This is mainly due to the lack of a well-defined structure and clear semantics in such data formats, which could also result in degradation of the quality of the data over time.
 In xCurator project, our goal is to add structure and enhance the quality of such data, by extracting entities and their type and associations, identification and merging of duplicate entities, linking related entities, and publishing the results on the Web, all in a lightweight easy-to-use and scalable framework that effectively incorporates user feedback in all phases. We have designed our system based on our experience in managing large volumes of (user-generated) data on the Web in several real-world applications.
   Live Applications
  • LinkedCT
    The Linked Clinical Trials (LinkedCT) data set is a Linked Data source of clinical trials data.
  • BibBase3
    The BibBase data server aims at creating high-quality Linked Data out the BibTeX files of BibBase's users.
  • Linking Semistructured Data on the Web
    S. Hassas Yeganeh, O. Hassanzadeh, and R.J. Miller.
    Proceedings of the 14th International Workshop on the Web and Databases (WebDB 2011) at SIGMOD 2011
  • BibBase Triplified Hassanzadeh, O.; Xin, R.; Fritz, C.; Yang, Y.; Du, J.; Zhao, M.; and Miller, R. J. In Proceedings of the 6th International Conference on Semantic Systems, September 1--3, 2010, Graz, Austria, 2010. Triplification Challenge Contestant. Honorary Mention in the Open Track.
   Related Projects
  • LinQuer:  Linkage Query Writer
  • Stringer:  Duplicate Detection System for String Data 
