Distributed Program Construction

Fall 1999

Lecture 9: Distributed Algorithms

Reading:
 
 
 


COMP 413
        Clock Synchronization



COMP 413


Problem: Sometimes we simply need the exact time, not just an ordering.

Solution: Universal Coordinated Time (UTC):

UTC is broadcasted through shortwave radio and satellite. Satellites can give an accuracy of about +-0.5 ms.

Question: Does this solve all our problems? Don't we now have some global timing mechanism?
 


COMP 413 Problem: Suppose we have a distributed system with a UTC-receiver on some node ==> we still have to distribute its time to each machine.

Basic principle:

In practice: 1-p <= dC/dt <=  1+p

Goal: Never let two clocks in any system differ by more than d time units => synchronize at least every d/(2p) seconds.


COMP 413

Clock Synchronization Principles

Principle I: Every machine asks a time server for the accurate time at least once every d/(2p) seconds.

Okay, but you need an accurate measure of round trip delay, including interrupt handling and processing incoming messages.

Principle II: Let the time server scan all machines periodically, calculate an average, and inform each machine how it should adjust its time relative to its
present time.

Okay , you’ll probably get every machine in sync. Note: you don’t even need to propagate UTC time (why not?)

Fundamental problem: You’ll have to take into account that setting the time back is never allowed => smooth adjustments.
 


COMP 413


Problem: We first need to introduce a notion of ordering before we can order anything. The happened-before relation on the set of events in a distributed system is the smallest relation satisfying:

Note: This defines a partial ordering of events in a system with concurrently executing processes.



COMP 413 Problem: How do we maintain a global view of the system's behavior that is consistent with the happened-before relationship?

Solution: attach a timestamp C(e) to each event e, satisfying the following properties:

Problem: How to attach a timestamp to an event when there is no global clock? => maintain a consistent set of logical clocks, one for each node.



COMP 413 Each node Ni maintains a local counter Ci, and adjusts this counter according to the following rules: Property P1 is satisfied by (1); property P2 by (2) and (3).


COMP 413



 


COMP 413


Problem: It can happened that two event happen at the same (logical) time. Avoid this by attaching a node number to an event:

    Pi timestamps each event e with Ci(e).i

Then: Ci(a).i before Cj(b).j iff:



COMP 413

Extension to Multicasting: Vector Clocks

Result is causally ordered multicast delivery!


COMP 413

Mutual Exclusion

Problem: A number of nodes in a distributed system want exclusive access to some resource.

Basic solutions:



COMP 413
COMP 413

Mutual Exclusion: Ricart & Agrawala

Principle: The same as Lamport, except that acknowledgments aren't sent. Instead, replies (i.e, grants) are sent only when

In all other cases, reply is deferred until the resource is released




COMP 413
Election Algorithms

Principle: An algorithm requires that some process acts as a coordinator. The question is how to select this special process dynamically.

Note: In many systems the coordinator is chosen by hand (e.g. file servers). This leads to centralized solutions => single point of failure.

Question: If a coordinator is chosen dynamically, to what extent can we speak about a centralized or distributed solution?

Question: Is a fully distributed solution, i.e . one without a coordinator, always more robust than any centralized/coordinated solution?


COMP 413
Election by Bullying

Principle: Each process has an associated priority (weight). The process with the highest priority should always be elected as the coordinator.

Issue: How do we find the heaviest process?



COMP 413
Election by Bullying

Question: We’re assuming something very important here – what?


COMP 413
Election in a Ring

Principle: Process priority is obtained by organizing processes into a (logical) ring. Process with the highest priority should be elected as coordinator.

Question: Does it matter if two processes initiate an election?

Question: What happens if a process crashes during the election?


COMP 413
Distributed Transactions

COMP 413
Transactions

Essential: All READ and WRITE operations are executed, i.e . their effects are made permanent, at the execution of END_TRANSACTION.

Observation: Transactions form an atomic operation.


COMP 413
ACID Properties










Model: A transaction is a collection of operations on the state of an object (database, object composition, etc.) that satisfies the following properties:

Atomicity: All operations either succeed, or all of them fail. When the transaction fails, the state of the object will remain unaffected by the transaction.

Consistency: A transaction establishes a valid state transition. This does not exclude the possibility of invalid, intermediate states during the transaction’ s execution.

Isolation: Concurrent transactions do not interfere with each other. It appears to each transaction T that other transactions occur either before T, or after T, but never both.

Durability: After the execution of a transaction, its effects are made permanent: changes to the state survive failures.


COMP 413
Transaction Classification

Flat transactions: The most familiar one: a sequence of operations that satisfies the ACID properties.

Flat transactions with savepoints: A refinement that allows a rollback to a user-defined intermediate state of the transaction, rather than a complete abort.

Chained transactions: A sequence of transactions by which resources that are no longer needed are returned, but without giving up others.
 


COMP 413
Transaction Classification

Nested transactions: A hierarchy of transactions that allows (1) concurrent processing of subtransactions, and (2) recovery per subtransaction.

Distributed transactions: A (flat) transaction that is executed in a distributed environment => often implemented as a two-level nested transaction with one subtransaction per node.

Multi-level transactions: Like a nested transaction, but with full commit facilities per subtransactions . Aborts are compensated by reverse subtransactions that undo state changes.
 


COMP 413
Flat Transactions: Limitations

Problem: Flat transactions constitute a very simple and clean model for dealing with a sequence of operations that satisfies the ACID properties. However, after a series of successful operations all changes should be undone in the case of failure. Sometimes unnecessary:

Trip planning. Plan a intercontinental trip where all flights have been reserved, but filling in the last part requires some “experimentation.” The first reservations are known to be in order, but cannot yet be committed.

Bulk updates. When updating bank accounts for monthly interests we have to lock the entire database (every account should be updated exactly once: it is a transaction over the entire database.)

Better: each update is immediately committed. However, in the case of failure, we’ll have to be able to continue where we left off.
 

 

COMP 413
Transactions: Rollback
Two approaches:
 

COMP 413
Transactions: Concurrency
Control

Problem: Increase efficiency by allowing several transactions to execute at the same time.

Constraint: Effect should be the same as if the transactions were executed in some serial order.

Solution: Apply two-phase locking techniques


Question: Does it actually make sense to allow concurrent transactions on a single server?


COMP 413
Transaction Processing Model

Consider a collection E of transactions T1,.....,Tn. Goal is to conduct a serializable execution of E:

Note: Because we’re not concerned with the computations of each transaction, a transaction can be modeled as a log of read and write operations.

Two operations OPER(Ti,x) and OPER(Tj,x) on the same data item x, and from a set of logs may conflict at a data manager:

read-write conflict (rw): One is a read operation while the other is a write operation on x.

write-write conflict (ww): Both are write operations on x.


COMP 413
Basic Scheduling Theorem

Note: The important thing is that we process conflicting reads and writes in certain relative orders. This is what concurrency control is all about.

Note: It turns out that read-write and write-write conflicts can be synchronized independently, as long as we stick to a total ordering of transactions that is consistent with both types of conflicts.
 


COMP 413
Synchronization Techniques

Two-phase locking: Before reading or writing a data item, a lock must be obtained. After a lock is given up, the transaction is not allowed to acquire any
more locks.

Timestamp ordering: Operations in a transaction are timestamped, and data managers are forced to handle operations in timestamp order.

Optimistic control: Don’t prevent things from going wrong, but correct the situation if conflicts actually did happen. Basic assumption: you can pull it off
in most cases.
 


COMP 413
Two-phase Locking



COMP 413
Two-phase Locking

COMP 413
Two-phase Locking: Problems

Problem 1: System can deadlock. How? Practical solutions:


Problem 2: When should the scheduler actually release a lock:

No good way of testing condition (2) unless transaction has been committed or aborted.

Moreover: Assume the following execution sequence takes place: RELEASE(Ti,x) -> LOCK(Tj,x)  -> ABORT(Ti,x).

Consequence: scheduler will have to abort Tj as well (cascaded aborts).

Solution: Release all locks only at commit/abort time (strict two-phase locking).


COMP 413
Timestamp Ordering

Basic idea:

Scheduler adheres to following rules:
 


Note: rather aggressive for if a single OPER(Ti,x) is rejected, Ti will have to be aborted.


COMP 413
Timestamp Ordering
Question: Why would we do this?


COMP 413
Multiversion Timestamp Ordering

Basic idea: Introduce finer granularity by maintaining versions of data items, rather than just recording times when it was read or written:



COMP 413
Conservative Timestamp Ordering

Basic idea: Operations that may possibly have to be rejected, are delayed until it is certain that they can be accepted:

Note: We’re assuming ordered message delivery, and “idle” transaction managers occasionally tell the others that they’ re not sending any requests.

Observation: This policy is really conservative.


COMP 413
Optimistic Concurrency Control

Observation: (1) Maintaining locks is costly; (2) in practice not many conflicts.

Alternative: Go ahead immediately with all operations, use tentative writes everywhere (shadow copies), and solve conflicts later on.

Phases: allow operations tentatively -> validate effects  ->  make updates permanent.

Validation: Check two basic rules for each pair of active transactions Ti and Tj:

If one of the rules doesn’t hold: abort transaction.


COMP 413
Distributed Transactions

How to achieve atomic commit among multiple transaction managers in a distributed system?

Two-phase commit protocol:

Coordinator
Subordinate(s)
Write "prepare" to log
Send "prepare" message
Write "ready" to log
Send "ready" message
Collect all replies (assuming everyone is ready)
Write log record (transaction committed)
Send "commit" message
Write "commit" to log
Commit
Send "finished" message


COMP 413