Toggle Main Menu Toggle Search

Open Access padlockePrints

Improving the Reliability of Cooperative Concurrent Systems with Exception Flow Analysis

Lookup NU author(s): Professor Alexander Romanovsky

Downloads


Abstract

Developers of fault-tolerant distributed systems must guarantee that the fault tolerance mechanisms they build are, themselves, reliable. Otherwise, these mechanisms might end up contributing negatively to overall system dependability, thus defeating the purpose of introducing fault tolerance into the system. To achieve the desired levels of reliability, the development of mechanisms for detecting and handling errors should be rigorous or formal. We present an approach to modeling and verifying fault-tolerant distributed systems that use exception handling as the main fault tolerance mechanism. The proposed approach is based on a formal model for specifying the structure of a system in terms of cooperating participants that handle exceptions in a coordinated manner. We employ coordinated atomic actions as a representative of mechanisms for exception handling in concurrent systems. We have validated the proposed approach by means of two case studies: (i) a system responsible for managing a production cell; and (ii) a medical control system. For both systems, the proposed approach helped us to uncover design faults in the form of implicit assumptions and omissions in the original specifications.


Publication metadata

Author(s): Castor Filho F, Romanovsky A, Rubira CMF

Publication type: Report

Publication status: Published

Series Title: School of Computing Science Technical Report Series

Year: 2008

Pages: 43

Print publication date: 01/06/2008

Source Publication Date: June 2008

Report Number: 1105

Institution: School of Computing Science, University of Newcastle upon Tyne

Place Published: Newcastle upon Tyne

URL: http://www.cs.ncl.ac.uk/publications/trs/papers/1105.pdf


Share