Lookup NU author(s): Dr Jacek Cala,
Professor Paolo Missier
Full text for this publication is not currently held within this repository. Alternative links are provided below where available.
© Springer Nature Switzerland AG 2018. Many resource-intensive analytics processes evolve over time following new versions of the reference datasets and software dependencies they use. We focus on scenarios in which any version change has the potential to affect many outcomes, as is the case for instance in high throughput genomics where the same process is used to analyse large cohorts of patient genomes, or cases. As any version change is unlikely to affect the entire population, an efficient strategy for restoring the currency of the outcomes requires first to identify the scope of a change, i.e., the subset of affected data products. In this paper we describe a generic and reusable provenance-based approach to address this scope discovery problem. It applies to a scenario where the process consists of complex hierarchical components, where different input cases are processed using different version configurations of each component, and where separate provenance traces are collected for the executions of each of the components. We show how a new data structure, called a restart tree, is computed and exploited to manage the change scope discovery problem.
Author(s): Cala J, Missier P
Publication type: Conference Proceedings (inc. Abstract)
Publication status: Published
Conference Name: 7th International Provenance and Annotation Workshop, IPAW 2018
Year of Conference: 2018
Online publication date: 06/09/2018
Acceptance date: 09/07/2018
Publisher: Springer Verlag
Library holdings: Search Newcastle University Library for this item
Series Title: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)