Toggle Main Menu Toggle Search

Open Access padlockePrints

On energy-efficient checkpointing in high-throughput cycle-stealing distributed systems

Lookup NU author(s): Dr Matthew ForshawORCiD, Dr Stephen McGough, Dr Nigel Thomas

Downloads

Full text for this publication is not currently held within this repository. Alternative links are provided below where available.


Abstract

Checkpointing is a fault-tolerance mechanism commonly used in High Throughput Computing (HTC) environments to allow the execution of long-running computational tasks on compute resources subject to hardware and software failures and interruptions from resource owners. With increasing scrutiny of the energy consumption of IT infrastructures, it is important to understand the impact of checkpointing on the energy consumption of HTC environments. In this paper we demonstrate through trace-driven simulation on real-world datasets that existing checkpointing strategies are inadequate at maintaining an acceptable level of energy consumption whilst reducing the makespan of tasks. Furthermore, we identify factors important in deciding whether to employ checkpointing within an HTC environment, and propose novel strategies to curtail the energy consumption of checkpointing approaches.


Publication metadata

Author(s): Forshaw M, McGough AS, Thomas N

Publication type: Conference Proceedings (inc. Abstract)

Publication status: Published

Conference Name: SMARTGREENS 2014 - Proceedings of the 3rd International Conference on Smart Grids and Green IT Systems

Year of Conference: 2014

Pages: 262-267

Online publication date: 03/04/2014

Acceptance date: 01/01/1900

Publisher: SciTePress

URL: http://www.smartgreens.org/?y=2014

Library holdings: Search Newcastle University Library for this item

ISBN: 9789897580253


Share