Toggle Main Menu Toggle Search

Open Access padlockePrints

Selective and recurring re-computation of Big Data analytics tasks: insights from a Genomics case study

Lookup NU author(s): Dr Jacek Cala, Dr Paolo Missier

Downloads


Licence

This is the final published version of a report that has been published in its final definitive form by School of Computing Science, University of Newcastle upon Tyne, 2017.

For re-use rights please refer to the publisher's terms and conditions.


Abstract

In Data Science, knowledge generated by a resource-intensive analytics process is a valuable asset. Such value, however, tends to decay over time as a consequence of the evolution of any of the elements the process depends on: external data sources, libraries, and system dependencies. It is therefore important to be able to (i) detect changes that may partially or completely invalidate prior outcomes, (ii) determine the impact that those changes will have on those prior outcomes, ideally without having to perform expensive re-computations, and (iii) optimise the process re-execution needed to selectively refresh affected outcomes. This paper presents an extensive experimental study on how the selective re-computation problem manifests itself in a relevant analytics task for Genomics, namely variant calling and clinical interpretation, and how the problem can be addressed using a combination of approaches. Starting from this experience, we then offer a blueprint for a generic re-computation meta-process that makes use of process history metadata to make informed decisions about selective re-computations in reaction to a variety of changes in the data.


Publication metadata

Author(s): Cala J, Missier P

Publication type: Report

Publication status: Published

Series Title: School of Computing Science Technical Report Series

Year: 2017

Pages: 22

Print publication date: 31/10/2017

Acceptance date: 01/01/1900

Report Number: 1515

Institution: School of Computing Science, University of Newcastle upon Tyne

Place Published: Newcastle upon Tyne

URL: http://www.ncl.ac.uk/media/wwwnclacuk/schoolofcomputingscience/files/trs/1515.pdf


Share