Toggle Main Menu Toggle Search

Open Access padlockePrints

[PhD Thesis] Preventing State Divergence in Replicated Distributed Systems

Lookup NU author(s): Dr Alan Tully


Full text is not currently available for this publication.


N-Modular Redundancy (NMR) is a form of active replication in which each processor is replicated to form a node and each processor replica within the node executes the same set of software components replicas. Communication between nodes, in the form of messages, passes through a voting mechanism by which processor failures are mased. When the degree of replication is three, the technique is known as Triple Modular Redundancy (TMR) and can tolerate the failure of a single node processor. For voting to be successful, non-faulty software component replicas must output identical messages in an identical order. If we assume that software components are deterministic, then we need only ensure that the replicas process identical input messages in an identical order. Such software components conform to the well understood and researched state machine model of active replication. However, most distributed programs employ mechanisms not incorporated in the state machine model such as timeouts and prioritized messages. These potential sources of non-determinism could lead to a divergence of state among software component replicas which could then produce inconsistent responses to identical input messages, thereby defeating the NMR voting mechanism. The main contributions of this thesis are: (i) To present an architecture for active replicated processing which may be applied to any distributed system. (ii) To present a more expressive, enhanced model for software components which incorporates non-determinism and show how a system of such software components may be replicated, using a single well-defined generic mechanism (the order process) to prevent state divergence. Since the problem of identical ordering can be formulated as the interactive consistency problem which is solvable in the presence of arbitrary (Byzantine) failures, the approach presented in this thesis, unlike any other published to date, is capable of tolerating such failures.

Publication metadata

Author(s): Tully A

Publication type: Report

Publication status: Published

Series Title:

Year: 1990

Institution: Computing Laboratory, University of Newcastle upon Tyne

Place Published: Newcastle upon Tyne

Notes: British Lending Library DSC stock location number: DX172721