6 April 1999. Thanks to John Ganter
Source: http://ganter.sandia.gov/orfac/NisError/ for full report (50K).


SAND98-2737
Unlimited Release
Printed February 1999
Document information and disclaimers

Managing Errors to Reduce Accidents
in High Consequence
Networked Information Systems

John H. Ganter
Decision Support Systems Software Engineering
Sandia National Laboratories
P. O. Box 5800
Albuquerque, New Mexico 87185
jganter@sandia.gov, http://ganter.sandia.gov

This paper is based on a presentation at the Workshop on Information Assurance and Trustworthy Networks, held by the Cross Industry Working Team (XIWT) and Bellcore in Washington, D.C., 17-18 November 1998.

ABSTRACT

Computers have always helped to amplify and propagate errors made by people. The emergence of Networked Information Systems (NISs), which allow people and systems to quickly interact worldwide, has made understanding and minimizing human error more critical. This paper applies concepts from system safety to analyze how hazards (from hackers to power disruptions) penetrate NIS defenses (e.g., firewalls and operating systems) to cause accidents. Such events usually result from both active, easily identified failures and more subtle latent conditions that have resided in the system for long periods. Both active failures and latent conditions result from human errors. We classify these into several types (slips, lapses, mistakes, etc.) and provide NIS examples of how they occur. Next we examine error minimization throughout the NIS lifecycle, from design through operation to reengineering. At each stage, steps can be taken to minimize the occurrence and effects of human errors. These include defensive design philosophies, architectural patterns to guide developers, and collaborative design that incorporates operational experiences and surprises into design efforts. We conclude by looking at three aspects of NISs that will cause continuing challenges in error and accident management: immaturity of the industry, limited risk perception, and resource tradeoffs.

Contents

Introduction
Concepts for Describing Failures and Accidents in Systems
Some Terms for Describing Human Effects in Systems
System Defenses and Accident Trajectories
Paradoxical Defenses: Defenses that Have the Potential To Be Hazardous
Defenses Throughout the System Lifecycle
. The design phase
. The operations phase
. Maintenance phase

Continuous Safety Management Challenges in NISs
Conclusions
References