Toggle Main Menu Toggle Search

Open Access padlockePrints

Fault Tolerance and System Structuring

Lookup NU author(s): Professor Brian RandellORCiD

Downloads


Abstract

We discuss a general approach to the design of fault-tolerant computing systems, concentrating on issues of system structuring rather than on the design of particular algorithms. Three forms of structuring are described. The first is based on the use of what we term ""idealised fault-tolerant components"". Such components provide a means of system structuring which makes it easy to identify what parts of a system have what responsibilities for trying to cope with what sorts of faults. The second is a ""recursive structuring"" scheme. It involves using complete computers as the basic idealised fault-tolerant components of a distributed computing system whose functionality matches that of its component computers. Finally we discuss a generalisation of the usual concepts of an ""atomic action"", which provides a means of structuring both forward and backward error recovery in distributed systems. These discussions are given in general terms, and also illustrated by brief accounts of recent and current work at Newcastle on the construction of UNIX-based fault-tolerant and distributed systems.


Publication metadata

Author(s): Randell B

Publication type: Report

Publication status: Published

Series Title: Computing Laboratory Technical Report Series

Year: 1983

Pages: 24

Report Number: 189

Institution: Computing Laboratory, University of Newcastle upon Tyne

Place Published: Newcastle upon Tyne

URL: http://www.cs.ncl.ac.uk/publications/trs/papers/189.pdf


Share