Toggle Main Menu Toggle Search

Open Access padlockePrints

Provenance Annotation and Analysis to Support Process Re-Computation

Lookup NU author(s): Dr Jacek CalaORCiD, Professor Paolo MissierORCiD

Downloads


Licence

This is the authors' accepted manuscript of a conference proceedings (inc. abstract) that has been published in its final definitive form by Springer, 2018.

For re-use rights please refer to the publisher's terms and conditions.


Abstract

Many resource-intensive analytics processes evolve over time following new versions of the reference datasets and software dependen- cies they use. We focus on scenarios in which any version change has the potential to affect many outcomes, as is the case for instance in high throughput genomics where the same process is used to analyse large cohorts of patient genomes, or cases. As any version change is unlikely to affect the entire population, an efficient strategy for restoring the cur- rency of the outcomes requires first to identify the scope of a change, i.e., the subset of affected data products. In this paper we describe a generic and reusable provenance-based approach to address this scope discovery problem. It applies to a scenario where the process consists of complex hierarchical components, where different input cases are processed using different version configurations of each component, and where separate provenance traces are collected for the executions of each of the com- ponents. We show how a new data structure, called a restart tree, is computed and exploited to manage the change scope discovery problem.


Publication metadata

Author(s): Cala J, Missier P

Editor(s): Belhajjame K; Gehani A; Alper P

Publication type: Conference Proceedings (inc. Abstract)

Publication status: Published

Conference Name: 7th International Provenance and Annotation Workshop, IPAW 2018

Year of Conference: 2018

Pages: 3-15

Print publication date: 06/09/2018

Online publication date: 06/09/2018

Acceptance date: 09/01/2018

Date deposited: 02/06/2018

ISSN: 0302-9743

Publisher: Springer

URL: https://doi.org/10.1007/978-3-319-98379-0_1

DOI: 10.1007/978-3-319-98379-0_1

Notes: From IPAW 2018: Provenance and Annotation of Data and Processes

Library holdings: Search Newcastle University Library for this item

Series Title: Lecture Notes in Computer Science (LNCS)

ISBN: 9783319983783


Share