| |
 |
| |
| |
 |
|
|
 |
| |
| |
Try a preview of
xCurator's mapping interface here. |
| |
|
Project Description |
| |
| |
Semistructured
data is abundant on the Web. Many Web data sources and APIs make their
data available in XML, JSON, or a domain-specific semistructured
format, with the goal of making the data easily accessible and usable
by Web application developers. Although such data formats are more
machine-processable than pure text documents, managing and analyzing
such data in large scale is often nontrivial. This is mainly due to the
lack of a well-defined structure
and clear semantics
in such data formats, which could also result in degradation of the
quality of the data over time.
In xCurator project, our goal is to add structure and enhance
the quality of such data, by extracting entities and their type and
associations, identification and merging of duplicate entities, linking
related entities, and publishing the results on the Web, all in a
lightweight easy-to-use and scalable framework that effectively
incorporates user feedback in all phases. We have designed our
system based on our experience in
managing large volumes of (user-generated) data on the Web in several
real-world applications.
|
| |
|
Live Applications |
| |
- LinkedCT
The Linked Clinical Trials (LinkedCT)
data set is a Linked Data source of clinical trials data.
- BibBase3
The BibBase data server aims at
creating high-quality Linked Data out the BibTeX files of BibBase's
users.
|
| |
|
People |
| |
|
| |
|
Publications |
| |
- Linking Semistructured Data on the Web
S. Hassas Yeganeh, O. Hassanzadeh, and R.J. Miller.
Proceedings of the 14th International Workshop on the Web and Databases
(WebDB 2011) at SIGMOD 2011
|
| |
|
Related Projects |
| |
- LinQuer:
Linkage Query Writer
- Stringer:
Duplicate Detection System for String Data
|
| |
|
Datasets and Experimental Results |
| |
|
| |
| |
|
 |
|