Toggle Main Menu Toggle Search

ePrints

Fault Injection Based Assessment of Fail-Silence Provided by Process Duplication versus Internal Error Detection

Lookup NU author(s): Dr Neil Speirs

Downloads


Abstract

In this paper, two software-based architectures for providing fail-silent processes, Voltan and Chameleon ARMORs, are analyzed using fault injection. The goal is to compare the fail-silence coverage provided by the internal error detection techniques in Chameleon ARMORs with an ideal case of full duplication provided by Voltan. Rather than providing fault tolerance through redundant customized hardware, Voltan and Chameleon take the alternate approach of providing fail-silence in software using ""off-the-shelf"" hardware components. Voltan uses duplicated processes to provide the abstraction of a fail-silent node running on a conventional processor. Chameleon supports a range of execution modes including replication and a variety of error detection techniques to provide node and process fail-silence. The goal of this study is to compare only the self-checking features of Chameleon ARMORs (i.e., ARMORs provided with the internal detection techniques) with full duplication in Voltan. The paper presents results from three different injection campaigns with two applications: Fast Fourier Transform and the radix sort. The first campaign to exercise the specific detection techniques in each system yielded a fail-silence coverage of 100% for Voltan and 99.5% for Chameleon ARMORs. The second campaign, where injection was done to areas not directly protected by any detection technique, gave a coverage of 43.3% for Voltan and 45.8% for Chameleon ARMORs. The third campaign, where random injections were done to the heap, stack, and code segments of the application processes, showed Voltan to be fail-silent 97.5% of the time and Chameleon ARMORs 84.6% or the time. In addition to providing an assessment of the fail-silence achieved by the two systems, the study also gives insights into the issues in comparing systems with different designs, implementations, and assumptions, through fault injection experiments.


Publication metadata

Author(s): Stott DT, Speirs NA, Xu J, Bagchi S, Whisnant K, Kalbarczyk Z, Iyer RK

Publication type: Report

Publication status: Published

Series Title: Department of Computing Science Technical Report Series

Year: 2000

Pages: 24

Print publication date: 01/01/2000

Source Publication Date: 2000

Report Number: 694

Institution: Department of Computing Science, University of Newcastle upon Tyne

Place Published: Newcastle upon Tyne

URL: http://www.cs.ncl.ac.uk/publications/trs/papers/694.pdf


Share