Lookup NU author(s): Dr Jacek Cala,
Professor Paolo Missier
This is the final published version of a report that has been published in its final definitive form by School of Computing Science, University of Newcastle upon Tyne, 2017.
For re-use rights please refer to the publisher's terms and conditions.
In Data Science, knowledge generated by a resource-intensive analytics process is a valuable asset. Such value, however, tends to decay over time as a consequence of the evolution of any of the elements the process depends on: external data sources, libraries, and system dependencies. It is therefore important to be able to (i) detect changes that may partially or completely invalidate prior outcomes, (ii) determine the impact that those changes will have on those prior outcomes, ideally without having to perform expensive re-computations, and (iii) optimise the process re-execution needed to selectively refresh affected outcomes. This paper presents an extensive experimental study on how the selective re-computation problem manifests itself in a relevant analytics task for Genomics, namely variant calling and clinical interpretation, and how the problem can be addressed using a combination of approaches. Starting from this experience, we then offer a blueprint for a generic re-computation meta-process that makes use of process history metadata to make informed decisions about selective re-computations in reaction to a variety of changes in the data.
Author(s): Cala J, Missier P
Publication type: Report
Publication status: Published
Series Title: School of Computing Science Technical Report Series
Print publication date: 31/10/2017
Acceptance date: 01/01/1900
Report Number: 1515
Institution: School of Computing Science, University of Newcastle upon Tyne
Place Published: Newcastle upon Tyne