Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Fault Tolerance and Standby Schemes in Computer Systems, Study notes of Mobile Computing

Moradabad Institute of Technology Mobile Computing

Fault tolerance techniques in computer systems, focusing on fault isolation, containment, and standby schemes. Standby schemes include cold, warm, and hot standby, each with its advantages and disadvantages. Faults are classified based on duration, underlying cause, and behavior. Checkpointing is a fault tolerance technique used to balance recovery cost and system performance.

Typology: Study notes

2010/2011

Uploaded on 09/04/2011

amit-mohta 🇮🇳

4.2

(152)

89 documents

1 / 7

This page cannot be seen from the preview

Don't miss anything!

11/26/14 1

Fault Treatment

•In this phase, the fault is first isolated and then repaired. The repair procedure

depends on the type of fault. Permanent faults require that the failed

component be replaced by a non-failed component. This requires a standby

component. The standby component has to be integrated into the system,

which means that its state has to be synchronized with the state of the rest of

the system. There are three general types of standby schemes:

•Cold standby -- This means that the standby component is not operational, so

that its state needs to be changed fully when the cutover occurs. This may be a

very expensive and lengthy operation. For instance, a large database may have

to be fully reconstructed (e.g., using a log of transactions) on a standby disc.

The advantage of cold standby schemes is that they do not introduce overhead

during the normal operation of the system. However, the cost is paid in fault

recovery time.

•Warm standby -- In this case, the standby component is used to keep the last

checkpoint of the operational component that it is backing up. When the

principal component fails, the backward error recovery can be relatively short.

The cost of warm standby schemes is the cost of backward recovery discussed

earlier (mainly high overhead).

•Hot standby -- In this approach, the standby component is fully active and

duplicating the function of the primary component. Thus, if an error occurs,

recovery can be practically instantaneous. The problem with this scheme is that

it is difficult to keep two components in lock step. In contrast to warm standby

schemes, in which synchronization is only performed during checkpoints, in this

case it has to be done on a constant basis. Invariably, this requires

communications between the primary and the standby, so that the overhead of

these schemes is often higher than the overhead for warm standby.

Partial preview of the text

Download Fault Tolerance and Standby Schemes in Computer Systems and more Study notes Mobile Computing in PDF only on Docsity!

Fault Treatment

(^) In this phase, the fault is first isolated and then repaired. The repair procedure depends on the type of fault. Permanent faults require that the failed component be replaced by a non-failed component. This requires a standby component. The standby component has to be integrated into the system, which means that its state has to be synchronized with the state of the rest of the system. There are three general types of standby schemes:
Cold standby -- This means that the standby component is not operational, so that its state needs to be changed fully when the cutover occurs. This may be a very expensive and lengthy operation. For instance, a large database may have to be fully reconstructed (e.g., using a log of transactions) on a standby disc. The advantage of cold standby schemes is that they do not introduce overhead during the normal operation of the system. However, the cost is paid in fault recovery time.
Warm standby -- In this case, the standby component is used to keep the last checkpoint of the operational component that it is backing up. When the principal component fails, the backward error recovery can be relatively short. The cost of warm standby schemes is the cost of backward recovery discussed earlier (mainly high overhead).
Hot standby -- In this approach, the standby component is fully active and duplicating the function of the primary component. Thus, if an error occurs, recovery can be practically instantaneous. The problem with this scheme is that it is difficult to keep two components in lock step. In contrast to warm standby schemes, in which synchronization is only performed during checkpoints, in this case it has to be done on a constant basis. Invariably, this requires communications between the primary and the standby, so that the overhead of these schemes is often higher than the overhead for warm standby.

Characteristics of Fault

Tolerance

(^) The basic characteristics of fault

tolerance require:

(^) No single point of failure
(^) No single point of repair
(^) Fault isolation to the failing component
(^) Fault containment to prevent

propagation of the failure

(^) Availability of reversion modes

Fault Classifications

(^) Based on duration , faults can be classified as transient or permanent. A transient fault will eventually disappear without any apparent intervention, whereas a permanent one will remain unless it is removed by some external agency. While it may seem that permanent faults are more severe, from an engineering perspective, they are much easier to diagnose and handle. A particularly problematic type of transient fault is the intermittent fault that recurs, often unpredictably.
(^) A different way to classify faults is by their underlying cause. Design faults are the result of design failures, like our coding example above. While it may appear that in a carefully designed system all such faults should be eliminated through fault prevention, this is usually not realistic in practice. For this reason, many fault-tolerant systems are built with the assumption that design faults are inevitable, and theta mechanisms need to be put in place to protect the system against them. Operational faults, on the other hand, are faults that occur during the lifetime of the system and are invariably due to physical causes, such as processor failures or disk crashes.
(^) Finally, based on how a failed component behaves once it has failed, faults can be classified into the following categories:
(^) Crash faults -- the component either completely stops operating or never returns to a valid state;
(^) Omission faults -- the component completely fails to perform its service;
(^) Timing faults -- the component does not complete its service on time;
(^) Byzantine faults -- these are faults of an arbitrary nature.

Fault Tolerance and Standby Schemes in Computer Systems, Study notes of Mobile Computing

Related documents

Partial preview of the text

Download Fault Tolerance and Standby Schemes in Computer Systems and more Study notes Mobile Computing in PDF only on Docsity!

Fault Treatment

Characteristics of Fault

Tolerance

tolerance require:

propagation of the failure

Fault Classifications

Fault Tolerance

• Checkpointing is a fault tolerance technique

widely used in various types of computer

systems. In checkpointing, an important issue

is how to achieve a good trade-off between

the recovery cost and the system

performance. Excessive checkpointing would

result in the performance degradation due to

the high costly I/O operations during

checkpointing. Equidistant and equicost are

two well-known checkpointing strategies for

addressing this issue.

Common Transaction

• Known database (Typically one)

• Bounded duration (Compared to

long transactions)

• Few or no interactions with other

concurrent events

• ACID properties easy to achieve