A new mode of inquiry, problem solving, and decision making has become pervasive in our society, consisting of applying computational and mathematical models to infer actionable information from large quantities of data. This paradigm, often called Big Data Analytics, or simply Big Data, is driving our businesses and holds the key to the next breakthroughs in science and engineering. In this paradigm, timely access to a wealth of data is not sufficient and must be accompanied by sophisticated, intuitive, and scalable analytical tools that will allow people to turn data into knowledge and then informed action. Despite the promise, big data is also a source of angst for society. If we put our decision making in the hands of algorithms and data, how do we know when to trust the results?
My research centers around
Big Data Curation which can be understood using the close analogy of art curation. An art curator is responsible for the selection, acquisition, care, dissemination, and interpretion of art work, and also commonly conducts research on the art work. A data curator must do the same with data. To do this for anything but the smallest of data collections, automated tools are a necessity, and it is these data curation tools, and the computational methods that enable them, that I study. Data curators need to use data for specific data management, analysis, or research tasks. To be of value, data must satisfy the requirements of these tasks, not only in terms of coverage (acquiring the right data), but also in terms of quality and provenance (determining the accuracy and authenticity of data).
My goal is to provide tools that allow a human curator
to ensure the
data retains or increases in value over time and is able to meet the requirements of a data management task. Curation is essential to ensure the conclusions made based on the data are reliable and trustworthy.