Toggle Main Menu Toggle Search

Open Access padlockePrints

Speedup and Parallelization Models for Energy-Efficient Many-Core Systems Using Performance Counters

Lookup NU author(s): Mohammed Al-Hayanni, Dr Rishad Shafik, Dr Ashur Rafiev, Dr Fei Xia, Professor Alex Yakovlev

Downloads


Licence

This is the authors' accepted manuscript of a conference proceedings (inc. abstract) that has been published in its final definitive form by IEEE, 2017.

For re-use rights please refer to the publisher's terms and conditions.


Abstract

Traditional speedup models, such as Amdahls, facilitate the study of the impact of running parallel workloads on manycore systems. However, these models are typically based on software characteristics, assuming ideal hardware behaviors. As such, the applicability of these models for energy and/or performance-driven system optimization is limited by two factors. Firstly, speedup cannot be measured without instrumenting the original software codes, and secondly, the parallelization factor of an application running on specific hardware is generally unknown. In this paper, we propose a novel method, whereby standard performance counters found in modern many-core platforms can be used to derive speedup without instrumenting applications for time measurements. We postulate that speedup can be accurately estimated as a ratio of instructions per cycle for a parallel manycore system to the instructions per cycle of a single core system. By studying the application instructions and system instructions for the first time, our method leads to the determination of the parallelization factor and the optimal system configuration for energy and/or performance. The method is extensively demonstrated through experiments on three different platforms with core numbers ranging from 4 to 61, running parallel benchmark applications (including synthetic and PARSEC benchmarks) on Linux operating system. Speedup and parallelization estimations using our method and their extensive cross-validations show negligible errors (up to 8%) in these systems. Additionally, we demonstrate the effectiveness of our method to explore parallelization-aware energy-efficient system configurations for many-core systems using energy-delay-product based formulations.


Publication metadata

Author(s): Al-Hayanni M, Shafik R, Rafiev A, Xia F, Yakovlev A

Publication type: Conference Proceedings (inc. Abstract)

Publication status: Published

Conference Name: 2017 International Conference on High Performance Computing & Simulation

Year of Conference: 2017

Pages: 410-417

Online publication date: 14/09/2017

Acceptance date: 02/06/2017

Date deposited: 18/07/2017

Publisher: IEEE

URL: https://doi.org/10.1109/HPCS.2017.68

DOI: 10.1109/HPCS.2017.68

Library holdings: Search Newcastle University Library for this item

ISBN: 9781538632505


Share