Lookup NU author(s): Professor Paolo Missier
This work is licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0).
The ability to measure the use and impact of published data sets is key to the success of the open data / open science paradigm. A direct measure of impact would require tracking data (re)use in the wild, which however is difficult to achieve. This is therefore commonly replaced by simpler metrics based on data download and citation counts. In this paper we describe a scenario where it is possible to track the trajectory of a dataset after its publication, and we show how this enables the design of accurate models for ascribing credit to data originators. A Data Trajectory (DT) is a graph that encodes knowledge of how, by whom, and in which context data has been re-used, possibly after several generations. We provide a theoretical model of DTs that is grounded in the W3C PROV data model for provenance, and we show how DTs can be used to automatically propagate a fraction of the credit associated with transitively derived datasets, back to original data contributors. We also show this model of transitive credit in action by means of a Data Reuse Simulator. Ultimately, our hope is that, in the longer term, credit models based on direct measures of data reuse will provide further incentives to data publication. We conclude by outlining a research agenda to address the hard questions of creating, collecting, and using DTs systematically across a large number of data reuse instances, in the wild.
Author(s): Missier P
Publication type: Conference Proceedings (inc. Abstract)
Publication status: Published
Conference Name: 11th International Digital Curation Conference
Year of Conference: 2016
Print publication date: 23/02/2016
Online publication date: 23/02/2016
Acceptance date: 01/01/2016
Date deposited: 19/01/2016
Publisher: Digital Curation Centre