The Database Group @ University of Toronto  Big Data Curation
     Research Vision | Projects | People | Publications | Talks | Funding 
         
 
 Research Vision
 
 

Big data has been described as large, dynamic, or highly heterogeneous data that requires new forms of processing to enable enhanced decision making and insight discovery. Many of these new forms of processing can be described as data curation, that is, the care of data to ensure it maintains its value over time.

 
 Projects
 
    Data Quality external link  Discovering and Repairing Errors in Data
    iBench external link A Scalable Benchmark Generator for Supporting Empirical Integration Research
 
 People
 
 
  Publications
 Talks
 
  • Big Data Curation
    Renée J. Miller.
    • University of Paris V, France, May 2014.
    • Database Group, University of California Santa Diego, CA, USA, 2014.
    • Information Science Institute (ISI), University of Southern California, Marina Del Rey, CA, USA, 2013.
    • Computer Science Department, University of Rochester, NY, USA, 2013.
    • Data Science Institute, University of California Santa Cruz, CA, USA, 2013.
    • DIMACS Workshop on Big Data Integration, Rutgers University, June 21st, 2013.
    • The David R. Cheriton Distinguished Lecture Series, University of Waterloo, January 30th, 2013.
    Abstract: Big data has been described as large, dynamic, or highly heterogeneous data that requires new forms of processing to enable enhanced decision making and insight discovery. Many of these new forms of processing can be described as data curation, that is, the care of data to ensure it maintains its value over time. In this talk, I describe our experience in curating several open data sets. I overview how we have adapted some of the traditional solutions for aligning data and creating semantics to account for the three V's of big data -- volume, velocity and variety.
  • On Schema Discovery
    Renée J. Miller.
    • Distinguished Lecture, School of Computing, Queen's University, 2012.
    • Cyber Center Seminar Series, Purdue University, 2012.
    • Keynote, IEEE International Conference on Data Mining (ICDM), December 12th, 2011.
    Abstract: Structured data is distinguished from unstructured data by the presence of a schema describing the logical structure and semantics of the data. The schema is the means through which we understand and query the underlying data. Schemas enable data independence. In this talk, I consider new challenges in the old problem of schema discovery. I'll discuss the changing role of schemas from prescriptive to descriptive. I'll use examples from Web data publishing and from Big Data Analytics to motivate the automation of schema discovery and maintenance.
 
 
 Funding
 
  • NSERC Big Data Curation
    Proposal October 2012. Funding Start Date April 1, 2013.

    Discovery Grant $220,000. Merit-based Accelerator Grant $120,000.

  • NSERC BIN
    Proposal June 2007. Funding Start Date April 1, 2009.

    NSERC $5,000,000. Industry (SAP, IBM, Sybase, Palomino, Zerofootprint) $1,025,000.

 
 
Copyright 2014 The Database Group @ University of Toronto | Last Updated May 1, 2014