






Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
fault tolerance to the braking system software. • Avoiding accidents by adding features to systems which mean that incidents do not result in an accident.
Typology: Lecture notes
1 / 10
This page cannot be seen from the preview
Don't miss anything!
CSE 466
Critical Systems Engineering
Slide 1
Adapted from Ian Sommerville
CSE 466
Critical Systems Engineering
To introduce the notion of critical systems
-^
To describe critical system attributes (reliability,availability, maintainability, safety and security)
-^
To introduce techniques used for developingreliable and safe systems
-^
To discuss the importance of people in criticalsystems engineering
CSE 466
Critical Systems Engineering
Slide 3
A critical system is any system whose ‘failure’could threaten human life, the system’senvironment or the existence of the organisationwhich operates the system.
-^
‘Failure’ in this context does NOT mean failureto conform to a specification but means anypotentially threatening system behaviour.
CSE 466
Critical Systems Engineering
Communication systems such as telephoneswitching systems, aircraft radio systems, etc.
-^
Embedded control systems for process plants,medical devices, etc.
-^
Command and control systems such as air-trafficcontrol systems, disaster management systems,etc.
-^
Financial systems such as foreign exchangetransaction systems, account managementsystems, etc.
CSE 466
Critical Systems Engineering
Slide 5
Most critical systems are now computer-basedsystems
-^
Critical systems are becoming more widespreadas society becomes more complex and morecomplex activities are automated
-^
People and operational processes are veryimportant elements of critical systems - theycannot simply be considered in terms of hardwareand software
CSE 466
Critical Systems Engineering
The cost of failure in a critical system is likely toexceed the cost of the system itself
-^
As well as direct failure costs, there are indirectcosts from a critical systems failure. These maybe significantly greater than the direct costs
-^
Society’s views of critical systems are not static -they are modified by each high-profile systemfailure
CSE 466
Critical Systems Engineering
Slide 7
Reliability•^
Concerned with failure to perform to specification
Availability•^
Concerned with failure to deliver required services
Maintainability•^
Concerned with the ability of the system to evolve
Safety•^
Concerned with behaviour which directly or indirectly threatenshuman life
Security•^
Concerned with the ability of the system to protect itself
CSE 466
Critical Systems Engineering
Attribute concerned with the number of times asystem fails to deliver specified services.Difficult to define in an intuitive way
-^
Can’t be defined without defining the context ofuse of the system
-^
Metrics used•^
MTTF - Mean Time to Failure. Time between observed systemfailures
-^
ROCOF - Rate of occurrence of failures. Number of failures ina given time period
CSE 466
Critical Systems Engineering
Slide 13
Critical systems attributes are NOT independent -the systems development process must beorganised so that all of them are satisfied at leastto some minimum level
-^
More rigorous (and expensive) developmenttechniques have to be used for critical systemsdevelopment because of the potential cost offailure
CSE 466
Critical Systems Engineering
Reliable systems should be ‘fault-free’ systemswhere ‘fault-free’ means that the system’sbehaviour always conforms to its specification
-^
Systems which are ‘fault-free’ may still failbecause of specification or operational errors
-^
The costs of producing reliable systems growsexponentially as reliability requirements areincreased. In reality, we can never be sure that wehave produced a ‘fault-free’ system
CSE 466
Critical Systems Engineering
Slide 15
Faults and failures are not the same thingalthough the terms are often used fairly loosely
-^
fault
is a static characteristic of a system such
as a loose nut on a wheel, an incorrect statementin a program, an incorrect instruction in anoperational procedure
-^
failure
is some unexpected system behaviour
resulting from a fault such as a wheel falling offor the wrong amount of a chemical being used ina reactor
CSE 466
Critical Systems Engineering
Achieving systems reliability is generally basedon the notion that system failures may be reducedby reducing the number of system faults
-^
Fault reduction techniques•^
Fault avoidance
-^
Fault detection
Alternatively, reliability can be achieved byensuring faults do not result in failures•^
Fault tolerance
CSE 466
Critical Systems Engineering
Slide 17
The use of development techniques whichreduces the probability that faults will beintroduced into the system•^
Certified development process•^
Use a process which is known to work
-^
Formal specification of the system•^
Discovers anomalies before design
-^
Use of ‘safe’ software development techniques•^
Avoidance of error-prone language constructs• Use of a programming language (such as Ada) which can detectmany programming errors at compile-time
-^
Certified sub-contractors
CSE 466
Critical Systems Engineering
The use of techniques in the developmentprocess which are likely to detect faults before asystem is delivered•^
Mathematical correctness arguments
-^
Measurement of test coverage
-^
Design/program inspections and formal reviews
-^
Independent verification and validation
-^
Run-time monitoring of the system
-^
Back-to-back testing
CSE 466
Critical Systems Engineering
Slide 19
In critical situations, systems must befault tolerant.
-^
Fault tolerance means that the system cancontinue in operation in the presence of systemfaults
-^
Even if the system has been demonstrated tobe fault-free, it must also be fault tolerant asthere may be specification errors or the validationmay be incorrect
CSE 466
Critical Systems Engineering
There are three replicated identical componentswhich receive the same input and whose outputsare compared
-^
If one output is different, it is ignored andcomponent failure is assumed
-^
Based on most faults resulting from componentfailures rather than design faults and a lowprobability of simultaneous component failure
-^
Applied to both hardware and (in a differentform) software systems
CSE 466
Critical Systems Engineering
Slide 25
Mishap (or accident)•^
An unplanned event or event sequence which results in humandeath or injury. It may be more generally defined as coveringdamage to property or the environment
Incident•^
A system failure which may potentially result in an accident
Hazard•^
A condition with the potential for causing or contributing to anincident
CSE 466
Critical Systems Engineering
Car crash resulting from a brake system failure•^
Hazard - faulty brake control software
-^
Incident - car fails to brake when instructed by driver
-^
Accident - car leaves road and crashes
Incorrect drug dosage administered due to faultyoperating instructions•^
Hazard - nurse follows a set of faulty operating instructions fora drug delivery system
-^
Incident - incorrect dosage of drug computed by system
-^
Accident - incorrect dosage of drug delivered to patient
CSE 466
Critical Systems Engineering
Slide 27
Safety can be achieved by•^
Avoiding hazards - developing the system so that hazardousstates do not arise. Proving a braking system meets itsspecification.
-^
Ensuring hazards do not result in incidents - addingfunctionality to the system to detect and correct hazards. Addingfault tolerance to the braking system software.
-^
Avoiding accidents by adding features to systems which meanthat incidents do not result in an accident. Providing a backupbraking system.
-^
Reducing the chances that accidents will result in damage topeople by adding protection to a system. Adding seat belts andairbags.
CSE 466
Critical Systems Engineering
Not the same thing. Reliability is concerned withconformance to a given specification and deliveryof service
-^
The number of faults which can cause safety-related failures is usually a small subset of thetotal number of faults which may exist in asystem
-^
Safety is concerned with ensuring system cannotcause damage irrespective of whetheror not it conforms to its specification
CSE 466
Critical Systems Engineering
Slide 29
System design should always be based around thenotion that no single point of failure cancompromise system safety. Systems shouldalways be able to tolerate one failure
-^
However, accidents usually arise because ofseveral simultaneous failures rather than a failureof a single part of the system
-^
Anticipating complex sub-system interactionswhen these sub-systems are failing is verydifficult
Safety requirementsspecification Functionalrequirementsspecification
Safety-integrityrequirementsspecification
Hazardanalysis
Riskassessment
Designation ofsafety-critical systems
Validationplanning
Design andimplementation
Verification
Safetyvalidation Operation andmaintenance
CSE 466
Critical Systems Engineering
Slide 31
People and associated operational processes areessential elements of critical systems
-^
People are probably the most important singlesource of failure in critical systems BUT they arealso the most effective mechanism we have forincident/accident avoidance
-^
Human factors are significant in the design, thedevelopment and the operation of critical systems
CSE 466
Critical Systems Engineering
Slide 32
Many (the majority?) of systems failures are dueto ‘errors’ made by operators of the system(pilots, controllers, signallers, etc.)
-^
However, it is arguable whether these operatorsshould be blamed for these errors - in many casesthey are a result of poor system design where theoperational situation was not understood by thesystem designers
CSE 466
Critical Systems Engineering
Slide 37
Safety-critical systems are systems whose failurecan damage people and the system’s environment
-^
Safety and reliability are not the same thing -reliable systems can be unsafe
-^
Process issues (a safety life cycle) are veryimportant for safety-critical systems
-^
Human, social and organisational factors must betaken into account in the development of criticalsystems