|
 |
| | | | | |
 |
|
 |
|
| Research Vision |
|
|
Big data has been described as large, dynamic, or highly heterogeneous data that requires new forms of processing to enable enhanced decision making and insight discovery. Many of these new forms of processing can be described as data curation, that is, the care of data to ensure it maintains its value over time.
|
|
| Projects |
|
Data Quality Discovering and Repairing Errors in Data
iBench A Scalable Benchmark Generator for Supporting Empirical Integration Research
|
|
| People |
|
- Lead Investigator
Renée J. Miller
(Professor, University of Toronto)
- Postdoctoral Fellows
Patricia Arocena
(Researcher and Associate Director, NSERC Business Intelligence Network)
- M.Sc. and PhD Graduate Students
Christina Christodoulaki
(M.Sc. Student, University of Toronto)
Jiang Du (PhD Student, University of Toronto)
Farzaneh Mahdisoltani (PhD Student, University of Toronto)
Fatemeh Nargesian (PhD Student, University of Toronto)
Natasha Prokoshyna (PhD Student, University of Toronto)
Javed Siddique (PhD Student, University of Toronto)
Pooya Saadat Panah (PhD Student, University of Toronto)
Erkang Zhu (MS Student, University of Toronto)
|
|
|
Publications |
| Talks |
|
- Big Data Curation
Renée J. Miller.
- University of Paris V, France, May 2014.
- Database Group, University of California Santa Diego, CA, USA, 2014.
- Information Science Institute (ISI), University of Southern California, Marina Del Rey, CA, USA, 2013.
- Computer Science Department, University of Rochester, NY, USA, 2013.
- Data Science Institute, University of California Santa Cruz, CA, USA, 2013.
- DIMACS Workshop on Big Data Integration, Rutgers University, June 21st, 2013.
- The David R. Cheriton Distinguished Lecture Series, University of Waterloo, January 30th, 2013.
Abstract: Big data has been described as large, dynamic, or highly heterogeneous data that requires new forms of processing to enable enhanced decision making and insight discovery. Many of these new forms of processing can be described as data curation, that is, the care of data to ensure it maintains its value over time. In this talk, I describe our experience in curating several open data sets. I overview how we have adapted some of the traditional solutions for aligning data and creating semantics to account for the three V's of big data -- volume, velocity and variety.
- On Schema Discovery
Renée J. Miller.
- Distinguished Lecture, School of Computing, Queen's University, 2012.
- Cyber Center Seminar Series, Purdue University, 2012.
- Keynote, IEEE International Conference on Data Mining (ICDM), December 12th, 2011.
Abstract: Structured data is distinguished from unstructured data by the presence of a schema describing the logical structure and semantics of the data. The schema is the means through which we understand and query the underlying data. Schemas enable data independence. In this talk, I consider new challenges in the old problem of schema discovery. I'll discuss the changing role of schemas from prescriptive to descriptive. I'll use examples from Web data publishing and from Big Data Analytics to motivate the automation of schema discovery and maintenance.
| |
|
| Funding |
|
- NSERC Big Data Curation
Proposal October 2012. Funding Start Date April 1, 2013.
Discovery Grant $220,000. Merit-based Accelerator Grant $120,000.
- NSERC BIN
Proposal June 2007. Funding Start Date April 1, 2009.
NSERC $5,000,000. Industry (SAP, IBM, Sybase, Palomino, Zerofootprint) $1,025,000.
|
|
|
|
 |
|