Toggle Main Menu Toggle Search

ePrints

Data trajectories: tracking reuse of published data for transitive credit attribution

Lookup NU author(s): Dr Paolo Missier

Downloads


Licence

This work is licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0).


Abstract

The ability to measure the use and impact of published data sets is key to the success of the open data / open science paradigm. A direct measure of impact would require tracking data (re)use in the wild, which however is difficult to achieve. This is therefore commonly replaced by simpler metrics based on data download and citation counts. In this paper we describe a scenario where it is possible to track the trajectory of a dataset after its publication, and we show how this enables the design of accurate models for ascribing credit to data originators. A Data Trajectory (DT) is a graph that encodes knowledge of how, by whom, and in which context data has been re-used, possibly after several generations. We provide a theoretical model of DTs that is grounded in the W3C PROV data model for provenance, and we show how DTs can be used to automatically propagate a fraction of the credit associated with transitively derived datasets, back to original data contributors. We also show this model of transitive credit in action by means of a Data Reuse Simulator. Ultimately, our hope is that, in the longer term, credit models based on direct measures of data reuse will provide further incentives to data publication. We conclude by outlining a research agenda to address the hard questions of creating, collecting, and using DTs systematically across a large number of data reuse instances, in the wild.


Publication metadata

Author(s): Missier P

Publication type: Conference Proceedings (inc. Abstract)

Publication status: Published

Conference Name: 11th International Digital Curation Conference

Year of Conference: 2016

Print publication date: 23/02/2016

Online publication date: 23/02/2016

Acceptance date: 01/01/2016

Publisher: Digital Curation Centre


Share