Lookup NU author(s): Dr Matthew Forshaw,
Dr Stephen McGough,
Dr Nigel Thomas
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).
Checkpointing is a fault-tolerance mechanism commonly used in High Throughput Computing (HTC) environments to allow the execution of long-running computational tasks on compute resources subject to hardware and software failures and interruptions from resource owners. Until recently many have focused on the performance gains sought through checkpointing, but with increasing scrutiny of the energy consumption of IT infrastructures, it is increasingly important to understand the impact of checkpointing on the energy consumption of HTC environments. In this paper we demonstrate through trace-driven simulation on real-world datasets that existing checkpointing strategies are inadequate at maintaining an acceptable level of energy consumption whilst reducing the makespan of tasks. Furthermore, we identify factors important in deciding whether to employ checkpointing within an HTC environment, and propose novel strategies to curtail the energy consumption of checkpointing approaches.
Author(s): Forshaw M, McGough AS, Thomas N
Publication type: Conference Proceedings (inc. Abstract)
Publication status: Published
Conference Name: 3rd International Conference on Smart Grids and Green IT Systems (SMARTGREENS) 2014
Year of Conference: 2014
Date deposited: 15/08/2014