Toggle Main Menu Toggle Search

Open Access padlockePrints

Provenance annotation and analysis to support process re-computation

Lookup NU author(s): Dr Jacek CalaORCiD, Professor Paolo MissierORCiD

Downloads

Full text for this publication is not currently held within this repository. Alternative links are provided below where available.


Abstract

© Springer Nature Switzerland AG 2018. Many resource-intensive analytics processes evolve over time following new versions of the reference datasets and software dependencies they use. We focus on scenarios in which any version change has the potential to affect many outcomes, as is the case for instance in high throughput genomics where the same process is used to analyse large cohorts of patient genomes, or cases. As any version change is unlikely to affect the entire population, an efficient strategy for restoring the currency of the outcomes requires first to identify the scope of a change, i.e., the subset of affected data products. In this paper we describe a generic and reusable provenance-based approach to address this scope discovery problem. It applies to a scenario where the process consists of complex hierarchical components, where different input cases are processed using different version configurations of each component, and where separate provenance traces are collected for the executions of each of the components. We show how a new data structure, called a restart tree, is computed and exploited to manage the change scope discovery problem.


Publication metadata

Author(s): Cala J, Missier P

Publication type: Conference Proceedings (inc. Abstract)

Publication status: Published

Conference Name: 7th International Provenance and Annotation Workshop, IPAW 2018

Year of Conference: 2018

Pages: 3-15

Online publication date: 06/09/2018

Acceptance date: 09/07/2018

Publisher: Springer Verlag

URL: https://doi.org/10.1007/978-3-319-98379-0_1

DOI: 10.1007/978-3-319-98379-0_1

Library holdings: Search Newcastle University Library for this item

Series Title: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

ISBN: 9783319983783


Share