









Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Transactions in database management system
Typology: Lecture notes
1 / 16
This page cannot be seen from the preview
Don't miss anything!
Introduction-Properties of Transaction- Serializability- Concurrency Control – Locking
Mechanisms-Two Phase Commit Protocol-Dead lock.
Transaction
A transaction is a unit of program execution that accesses and updates various data
items.
Usually, a transaction is initiated by a user program written in a high-level data-
manipulation language or programming language (for example, SQL, COBOL, C, C++,
or Java), where it is delimited by statements (or function calls) of the form begin
transaction and end transaction.
The transaction consists of all operations executed between the begin transaction
and end transaction.
ACID Properties
To ensure integrity of the data, we require that the database system maintain the following
properties of the transactions:
1. Atomicity. A transaction is said to be atomic if all the operation of the transaction are
executed at once or none of them is executed.
2. Consistency. A database is said to be consistent if a transaction is executed in a isolated
manner.
3. Isolation. Even though multiple transactions may execute concurrently, the system
guarantees that, for every pair of transactions Ti and Tj, it appears to Ti that either Tj finished
execution before Ti started, or Tj started execution after Ti finished. Thus, each transaction
is unaware of other transactions executing concurrently in the system.
4. Durability. After a transaction completes successfully, the changes it has made to the
database persist, even if there are system failures.
These properties are often called the ACID properties ; the acronym is derived from the first
letter of each of the four properties.
Let Ti be a transaction that transfers 50 from account A to account B. This transaction can be
defined as:
Read (A);
Write (A);
read(B);
Write (B).
read (X), which transfers the data item X from the database to a local buffer belonging to
the transaction that executed the read operation.
write (X), which transfers the data item X from the the local buffer of the transaction that
executed the write back to the database.
Let us now consider each of the ACID requirements. (For ease of presentation, we consider
them in an order different from the order A-C-I-D).
Consistency : The consistency requirement here is that the sum of A and B be unchanged by
the execution of the transaction. Without the consistency requirement, money could be
created or destroyed by the transaction! It can be verified easily that, if the database is
consistent before an execution of the transaction, the database remains consistent after the
execution of the transaction. Ensuring consistency for an individual transaction is the
responsibility of the application programmer who codes the transaction.
Atomicity : Suppose that, just before the execution of transaction Ti the values of accounts A
and B are 1000 and 2000, respectively. Now suppose that, during the execution of
transaction Ti, a failure occurs that prevents Ti from completing its execution successfully.
Examples of such failures include power failures, hardware failures, and software errors.
Question 1 : During its execution, a transaction passes through several states, until it finally
commits or aborts. List all possible sequences of states through which a transaction may
pass. Explain why each state transition may occur.
Answer: The possible sequences of states are:-
a. active→ partially committed →committed. This is the normal sequence a successful
transaction will follow. After executing all its statements it enters the partially committed
state. After enough recovery information has been written to disk, the transaction finally
enters the committed state.
b. active → partially committed → aborted. After executing the last statement of the
transaction, it enters the partially committed state. But before enough recovery information
is written to disk, a hardware failure may occur destroying the memory contents. In this case
the changes which it made to the database are undone, and the transaction enters the
aborted state.
c. active →failed →aborted. After the transaction starts, if it is discovered at some point that
normal execution cannot continue (either due to internal program errors or external errors),
it enters the failed state. It is then rolled back, after which it enters the aborted state.
Concurrent Executions
Transaction-processing systems usually allow multiple transactions to run
concurrently. Allowing multiple transactions to update data concurrently causes
several complications with consistency of the data. Ensuring consistency in spite of
concurrent execution of transactions requires extra work; it is far easier to insist that
transactions run serially —that is, one at a time, each starting only after the previous
one has completed.The motivation for using concurrent execution in a database is
essentially the same as the motivation for using multiprogramming in an operating
system.
There are two good reasons for allowing concurrency:
Conflict Serializability
Let us consider a schedule S in which there are two consecutive instructions Ii and Ij, of
transactions Ti and Tj , respectively (i _= j). If Ii and Ij refer to different data items, then we
can swap Ii and Ij without affecting the results of any instruction in the schedule. However, if
Ii and Ij refer to the same data item Q, then the order of the two steps may matter. Since we
are dealing with only read and write instructions, there are four cases that we need to
consider:
1. Ii = read(Q), Ij = read(Q). The order of Ii and Ij does not matter, since the same value of Q is
read by Ti and Tj , regardless of the order.
2. Ii = read(Q), Ij = write(Q). If Ii comes before Ij, then Ti does not read the value of Q that is
written by Tj in instruction Ij. If Ij comes before Ii, then Ti reads the value of Q that is written
by Tj. Thus, the order of Ii and Ij matters.
3. Ii = write(Q), Ij = read(Q). The order of Ii and Ij matters for reasons similar to those of the
previous case.
4. Ii = write(Q), Ij = write(Q). Since both instructions are write operations, the order of these
instructions does not affect either Ti or Tj. However, the value obtained by the next read(Q)
instruction of S is affected, since the result of only the latter of the two write instructions is
preserved in the database. If there is no other write (Q) instruction after Ii and Ij in S, then
the order of Ii and Ij directly affects the final value of Q in the database state that results
from schedule S.
View Serializability
Consider two schedules S and S’, where the same set of transactions participates in both
schedules. The schedules S and S’ are said to be view equivalent if three conditions are met:
schedule T2 with the single edge T2 - > T1, since all the instructions of T2 are executed before
the first instructions of T1 is executed.
If the precedence graph for S has a cycle then schedule S is not conflict serializable. If
the graph contains no cycle, then the schedule is known as conflict serializable.
A serializability order of the transactions can be obtained through topological sorting,
which determines a linear order consistent with the partial partial order of the precedence
graph.
Illustration of topological sorting
Thus, to test for conflict serializability, we need to construct the precedence graph and to
invoke a cycle detection algorithm. Cycle detection algorithm can be found in standard
textbooks on algorithms. Cycle detection algorithms, such as those based on depth first
search, require on the order of n2 operations, and where n is the number of vertices in the
graph (that is, number of transactions). Thus we have a practical scheme for determining
conflict serializability.
Transaction Definition in SQL
A data-manipulation language must include a construct for specifying the set of actions that
constitute a transaction. The SQL standard specifies that a transaction begins implicitly.
Transactions are ended by one of these SQL statements:
There a simple and efficient method for determining conflict serializability of a schedule.
Consider a schedule S. Then construct a directed graph, called a precedence graph , from S. This
graph consists of a pair G = (V, E), where V is a set of vertices and E is a set of edges. The set of
vertices consists of all the transactions participating in the schedule. The set of edges consists
of all edges T i
j for which one of three conditions holds:
1. Ti executes write(Q) before T j
executes read(Q).
2. Ti executes read(Q) before T j
executes write(Q).
3. Ti executes write(Q) before T j
executes write(Q).
Concurrency Control
Concurrent control is a technique to ensure that concurrent execution of transaction
must results in a consistent database.
It uses certain protocols to ensure the serializability of a schedule of concurrent
executing transactions.
Some of these protocols are:
Lock based protocol
Time stamp based protocol
Lock-Based Protocols
One way to ensure serializability is to require that data items be accessed in a mutually
exclusive manner; that is, while one transaction is accessing a data item, no other
transaction can modify that data item. The most common method used to implement this
requirement is to allow a transaction to access a data item only if it is currently holding a lock
on that item.
Lock
It is a variable associated with a data item X that reflects the status of data item X
with respect to possible operations that can be applied. A data item X can be locked
in two modes:
Shared lock
If a transaction T has obtained shared mode lock, then T can read but cannot write. It
is denoted by S.
Exclusive lock
If a transaction T has obtained exclusive mode lock, then T can both read and write. It
is denoted by X.
Transaction Ti may unlock a data item that it had locked at some earlier point. Note
that a transaction must hold a lock on a data item as long as it accesses that item.
Moreover, for a transaction to unlock a data item immediately after its final access of
that data item is not always desirable, since serializability may not be ensured.
Concurrency-
T1 T2 Control Manager
lock-X(B)
grant-X(B, T1)
read(B)
write(B)
unlock(B) lock-S(A)
grant-S(A, T2)
read(A)
unlock(A)
lock-S(B)
grant-S(B, T2)
read(B)
unlock(B)
display(A + B)
lock-X(A) grant-X(A, T2)
read(A)
write(A)
unlock(A)
Granting of Locks
When a transaction requests a lock on a data item in a particular mode, and no other
transaction has a lock on the same data item in a conflicting mode, the lock can be granted.
However, care must be taken to avoid the following scenario. Suppose a transaction T2 has a
shared-mode lock on a data item, and another transaction T1 requests an exclusive-mode
lock on the data item. Clearly, T1 has to wait for T2 to release the shared-mode lock.
Meanwhile, a transaction T3 may request a shared-mode lock on the same data item. The
lock request is compatible with the lock granted to T2, so T3 may be granted the shared-
mode lock. At this point T2 may release the lock, but still T1 has to wait for T3 to finish. But
again, there may be a new transaction T4 that requests a shared-mode lock on the same
data item, and is granted the lock before T3 releases it. In fact, it is possible that there is a
sequence of transactions that each requests a shared-mode lock on the data item, and each
transaction releases the lock a short while after it is granted, but T1 never gets the exclusive-
mode lock on the data item. The transaction T1 may never make progress, and is said to be
starved.
The Two-Phase Locking Protocol
One protocol that ensures serializability is the two-phase locking protocol. This protocol
requires that each transaction issue lock and unlock requests in two phases:
1. Growing phase. A transaction may obtain locks, but may not release any lock. 2. Shrinking phase. A transaction may release locks, but may not obtain any new locks.
Initially, a transaction is in the growing phase. The transaction acquires locks as needed.
Once the transaction releases a lock, it enters the shrinking phase, and it can issue no more
lock requests.
Timestamps base Protocol
Timestamps:
With each transaction Ti in the system, we associate a unique fixed timestamp, denoted by
TS(Ti). This timestamp is assigned by the database system before the transaction Ti starts
execution. If a transaction Ti has been assigned timestamp TS(Ti), and a new transaction Tj
enters the system, then TS(Ti) < TS(Tj ). There are two simple methods for implementing this
scheme:
wait–die
The wait–die scheme is a non-preemptive technique. When transaction Ti requests a data
item currently held by T j
i
is allowed to wait only if it has a timestamp smaller than that of
j
(that is, Ti is older than T j
). Otherwise, Ti is rolled back (dies).
For example, suppose that transactions T22, T23, and T24 have timestamps 5, 10, and 15,
respectively. If T22 requests a data item held by T23, then T22 will wait. If T24 requests a data
item held by T23, then T24 will be rolled back.
wound–wait
The wound–wait scheme is a preemptive technique. It is a counterpart to the wait–die
scheme. When transaction Ti requests a data item currently held by T j
i
is allowed to wait
only if it has a timestamp larger than that of T j (that is, Ti is younger than T j ). Otherwise, T j is
rolled back (T j
is wounded by T i
Returning to our example, with transactions T22, T23, and T24, if T22 requests a data item
held by T23, then the data item will be preempted from T23, and T23 will be rolled back. If
T24 requests a data item held by T23, then T24 will wait.
Deadlock Detection and Recovery
If a system does not employ some protocol that ensures deadlock freedom, then a detection
and recovery scheme must be used. An algorithm that examines the state of the system is
invoked periodically to determine whether a deadlock has occurred. If one has, then the
system must attempt to recover from the deadlock. To do so, the system must:
O Maintain information about the current allocation of data items to
transactions, as well as any outstanding data item requests.
O Provide an algorithm that uses this information to determine whether the
system has entered a deadlock state.
O Recover from the deadlock when the detection algorithm determines that a
deadlock exists.
Database Recovery System
There are various types of failure that may occur in a system, each of which needs to be
dealt with in a different manner.
Transaction failure. There are two types of errors that may cause a transaction to fail:
Logical error. The transaction can no longer continue with its normal execution
because of some internal condition, such as bad input, data not found, overflow, or
resource limit exceeded.
System error. The system has entered an undesirable state (for example, deadlock),
as a result of which a transaction cannot continue with its normal execution. The
transaction, however, can be reexecuted at a later time.
System crash. There is a hardware malfunction, or a bug in the database software or the
operating system, that causes the loss of the content of volatile storage, and brings transaction
processing to a halt. The content of nonvolatile storage remains intact, and is not corrupted.
Disk failure. A disk block loses its content as a result of either a head crash or failure during a
data transfer operation. Copies of the data on other disks, or archival backups on tertiary
media, such as tapes, are used to recovers from the failure.
Storage Types
Volatile storage. Information residing in volatile storage does not usually survive
system crashes. Examples of such storage are main memory and cache memory.
Access to volatile storage is extremely fast, both because of the speed of the
memory access itself, and because it is possible to access any data item in volatile
storage directly.
Nonvolatile storage. Information residing in nonvolatile storage survives system
crashes. Examples of such storage are disk and magnetic tapes. Disks are used for
online storage, whereas tapes are used for archival storage.
Stable storage. Information residing in stable storage is never lost.
Deferred Database Modification
The deferred-modification technique ensures transaction atomicity by recording all
database modifications in the log, but deferring the execution of all write operations
of a transaction until the transaction partially commits.
When a transaction partially commits, the information on the log associated with the
transaction is used in executing the deferred writes. If the system crashes before the
transaction completes its execution, or if the transaction aborts, then the
information on the log is simply ignored.
The execution of transaction Ti proceeds as follows. Before T i
starts its execution, a
record <T i
start> is written to the log. A write(X) operation by T i
results in the writing
of a new record to the log. Finally, when T i
partially commits, a record <T i
commit> is
written to the log.
Immediate Database Modification
The immediate-modification technique allows database modifications to be output to
the database while the transaction is still in the active state. Data modifications
written by active transactions are called uncommitted modifications.
Before a transaction Ti starts its execution, the system writes the record <T i
start> to
the log. During its execution, any write(X) operation by T i is preceded by the writing
of the appropriate new update record to the log. When T i
partially commits, the
system writes the record <T i
commit> to the log.