Distributed Program Construction
Fall 1999
Lecture 8: Date Replication and Coherence
Reading:
-
D. Mosberger: “Memory Consistency Models. ” Operating
Systems Reviews, 27(1):18–26, Jan. 1993.
-
D. B. Terry et al.: “Session Guarantees for Weakly
Consistent Replicated Data. ” In Proc. Third Int’l Conf . on Parallel and
Distributed Information Systems , pp . 140–149, Austin, TX, Sept. 1994.
IEEE.
COMP 413
Sharing Replicated Data
-
Shared objects
-
Replicated shared objects
Problem: If objects (or data) are shared, we need
to do something about concurrent accesses to guarantee state consistency.
COMP 413
Concurrency Control: Nonreplicated Shared Objects
Basic issue: If the shared object resides in
one address space, controlling concurrency is easy:
-
Protect the object using local locks and condition variables
-
Serialize access by the server
COMP 413
Concurrency Control: Replicated Shared Objects
Problem: Having copies of shared objects (data), and
concurrent updates, may get into serious consistency problems
COMP 413
Performance and Scalability
Main issue: To keep replicas
consistent, we need to ensure that all conflicting operations are done
in the the same order everywhere
Conflicting operations: From
the world of transactions:
-
Read–write conflict: a read operation
and a write operation act concurrently
-
Write–write conflicts: two concurrent
write operations
Guaranteeing global ordering on conflicting operations is
a costly operation, downgrading scalability
Solution: weaken consistency requirements so that,
hopefully, global synchronization can be avoided
COMP 413
Data-Centric Coherence Models
Strong consistency models: Operations on shared data
are synchronized:
-
Strict consistency (related to time)
-
Sequential consistency (what we are used to)
-
Causal consistency (maintains only causal relations)
-
PRAM consistency (maintains only individual ordering)
Weak consistency models: Synchronization occurs only
when shared data is locked and unlocked:
-
General weak consistency
-
Release consistency
-
Entry consistency
Observation: The weaker the consistency model, the
easier it is to build a scalable solution.
COMP 413
Any read to a shared data item X returns the value stored
by the most recent write operation on X.
Observation: It doesn’t make sense to talk about
“the most recent” in a distributed environment.
-
Assume all data items have been initialized to 0
-
W(x)1: value 1 is written to x
-
R(x)1: reading x returns the value 1
Note: Strict consistency is what you get in the normal
sequential case, where your program does not interfere with any other program.
COMP 413
The result of any execution is the same as if the operations
of all processes were executed in some sequential order, and the operations
of each individual process appear in this sequence in the order specified
by its program.
Note: We’ re talking about interleaved executions:
there is some total ordering for all operations taken together .
r + w >= t, where
-
r : read time
-
w: write time
-
t: minmal packet transmission time between nodes
COMP 413
Writes that are potentially causally related must be seen
by all processes in the same order. Concurrent writes may be seen in a
different order by different processes.
COMP 413
Pipelined RAM Consistency
Writes done by a single process are received by all other
processes in the order in which they were issued, but writes from different
processes may be seen in a different order by different processes .
COMP 413
Weak Consistency
-
Accesses to synchronization variables are sequentially consistent.
-
No access to a synchronization variable is allowed to be
performed until all previous writes have completed everywhere.
-
No data access is allowed to be performed until all previous
accesses to synchronization variables have been performed.
Basic idea: You don’t care that reads and writes of
a series of operations are immediately known to other processes. You just
want the effect of the series as a whole to be known.
Observation: Weak consistency implies that we need
to lock and unlock data (implicitly or not).
COMP 413
Release Consistency
Idea: Divide access to a synchronization variable
into two parts: an acquire and a release phase . Acquire
forces a requester to wait until the shared data can
be accessed; release sends requester’ s local value to
shared memory.
COMP 413
Entry Consistency
-
With release consistency, all local
updates are propagated to other processors during release of shared variable.
-
With entry consistency, each shared
variable is associated with a synchronization variable.
-
When acquiring the synchronization
variable, the most recent values of its associated shared variables are
fetched.
Note: Where release consistency
affects all shared variables, entry consistency affects only those shared
variables associated with a synchronization variable.
Question: What would be a
convenient way of making entry consistency more or less transparent to
programmers?
COMP 413
Client-Centric Coherence Models
-
System model
-
Read-your-writes
-
Monotonic reads
-
Write-follows-reads
-
Monotonic writes
COMP 413
Client-Centric
Coherence Models
Goal: Show how we can perhaps avoid system-wide
consistency, by concentrating on what specific clients want, instead of
what should be maintained by servers.
Background: Most large-scale distributed systems
(i.e ., databases) apply replication for scalability, but can support only
weak consistency:
-
DNS: Updates are propagated slowly, and inserts may
not be immediately visible.
-
NEWS: Articles and reactions are pushed and pulled
throughout the Internet, such that reactions can be seen before postings.
-
Lotus Notes: Geographically dispersed servers replicate
documents, but make no attempt to keep (concurrent) updates mutually consistent.
-
WWW: Caches all over the place, but there need be
no guarantee that you are reading the most recent version of a page.
COMP 413
Consistency for Mobile Users
Example: Consider a distributed database to which
you have access through your notebook. Assume your notebook acts as a front
end to the database.
-
At location A you access the database doing reads
and updates.
-
At location B you continue your work, but unless you
access the same server as the one at location A, you may detect
inconsistencies:
-
your updates at A may not have yet been propagated
to B
-
you may be reading newer entries than the ones available
at A
-
your updates at B may eventually conflict with those
at A
Note: The only thing you really want is that the entries
you updated and/or read at A, are in B the way you left them
in A. In that case, the database will appear to be consistent to
you.
COMP 413
Basic Architecture
COMP 413
System Model
-
Each write has a globally unique identifier WID
COMP 413
Read Your Writes
Definition: if read R follows write W
in a session, and R is performed on DB(S,t), then W
should have been in DB(S,t):
Note: There is no guarantee that R returns
W:
there may have been writes from other clients at S between
W
and R.
Example: Updating your Web page and guaranteeing
that your Web browser shows the newest version instead of its cached copy.
COMP 413
Monotonic Reads
Intuitively: Ensure (within a sessions) that each
read operation is made only at a server containing all the writes that
were seen by previous reads in s.
Notation: A set WS of writes is complete
for read R and DB(S,t) (Complete(WS,S,t,R)) if and only if:
Notation: WS=RelevantWrites(S,t,R) iff:
Example: Automatically
reading your personal calendar updates from different servers. Monotonic
Reads guarantees that the user sees all updates, no matter from which server
the automatic reading takes place.
Example: Reading
(not modifying) incoming mail while you are on the move. Each time you
connect to a different e-mail server, that server fetches (at least) all
the updates from the server you previously visited.
COMP 413
Writes Follows Reads
Intuitively: If a read precedes a write (in a session),
then that write is performed after all writes that preceeded the read.
Definition: if a read R precedes a write
W,
and R is performed at server S, then if W is performed
at server S*, all relevant writes for R are also performed at S*,
and before W:
Note: We are imposing two conditions which need
not always be relevant:
-
Drop (1): See locally updated data items without having to
see all the writes at other servers that formally preceded the update.
However, you do not want old writes to “undo” your local update, so keep
(2). WFRO:
-
Drop (2): See reactions to posted articles only if you have
the original posting, but don’t worry about the ordering between reaction
and posting,
so keep (1). WFRP:
COMP 413
Monotonic Writes
COMP 413
Replica Coherence Protocols
-
Sequential consistency protocols
-
General design issues
COMP 413
Algorithms for Sequential Consistency
Observation: In the end we always want sequential
consistency, whether or not it is implemented using weak consistency in
combination with synchronization
variables (locks).
Observation: If we discard whether all data is
globally consistent (as is the case with release consistency), or specific
data is consistent (entry consistency), we need to distinguish three access
methods.
COMP 413
Read Remote, Write Remote
-
A single process is responsible for maintaining the shared
data.
-
All read and write requests are forwarded to the server.
-
The server does the read/write operation, and returns a reply
to the client.
-
Adv: Really simple implementation, and very easy to
maintain sequential consistency (for free).
-
Disadv: Server can become a bottleneck, but this can
be partially alleviated by distributing shared data using some data-address
hashing scheme.
COMP 413
Read Migrate, Write Migrate
-
Data is always migrated to the process that wants to access
it
-
No distinction between read or write accesses.
-
Better watch out for thrashing: data keeps migrating
between different processes so that shared data access rate drops too much.
-
In some cases , it can easily be integrated with virtual
memory techniques: a page fault gives the local OS full control, so that
fetching data can be handled transparently.
-
You’ll have to do something about locating data. Either multicasting
or keeping track of shared data through chains of forwarding pointers.
Doesn’t really scale.
COMP 413
Read Local, Write Remote
-
Essentially, get your own local copy of the data and read
it as long as it’ s valid.
-
Writing data means (1) f orwarding your update to a primary,
(2) having all copies invalidated, and (possibly) (3) doing the write locally
after receiving ACK from primary.
-
Experience shows that this is a very reasonable model, but
that more needs to be done in order to achieve competitive efficiency.
COMP 413
Read Local, Write Migrate
-
Get your own local copy of the data and read it as long as
it’ s valid.
-
Writing data means (1) becoming primary (fetching the most
recent state), (2) having all copies invalidated, and (3) doing the update
locally.
-
Interesting observation: do an immediate write, and delay
the invalidation (affects the consistency, but improves the performance).
-
Experience shows that this is a very reasonable model, but
that more needs to be done in order to achieve competitive efficiency.
COMP 413
Read Replicate, Write Replicate
-
Again, get your own local copy of the data and read it as
long as it’ s valid.
-
Writing data means (1) getting a local copy if you don’t
have it yet, (2) write to your local copy, and (3) update all other copies.
-
The hard part: to achieve sequential consistency, we need
a totally ordered multicast protocol. We’ re back where we started. It
turns out this is doable
for local distributed systems with hardware multicasting
facilities (e .g., Amoeba on Ethernet).
COMP 413
Replication Strategies (1/3)
Observation: Choosing one of the basic algorithms
for (sequential) consistency is only the first step. There are many more
refinements to make!
Model: We consider objects (and don’t worry whether
they contain just data or code, or both)
Distinguish different stores: A store is capable
of hosting a replica of an object:
-
Permanent store: Server always having a replica
-
Object-initiated store: Server that can dynamically
host a replica upon request of the object
-
Client-initiated store: Server that can dynamically
host a replica upon request of a client
COMP 413
Replication Strategies (2/3)
COMP 413
Replication Strategies (3/3)
Change distribution: What is distributed between
the replicas:
-
Notification/in validation messages
-
Transfer of state (full state, or only the differences)
-
Method shipping (send the operation that caused the change
of state)
Store responsiveness: When does a replica take action
when it notices inconsistency:
-
Immediate reaction
-
Lazy reaction (e .g., periodic, or after some event)
-
Remains passive (e .g., because it is most-up-to-date)
Store reaction: What does a replica actually do:
-
Pull (i.e ., fetch change)
-
Push (i.e ., propagate change)
COMP 413
Example Consistency Protocol
Observation: Basically, all stores in the Web apply
the same replication strategy regardless the content of the Web document
(there are a few exceptions)
Observation: Caching is becoming more and more
problematic in the Web, as the number of documents grows exponentially.
Again: It is the consistency requirements that
determines the applicability of scaling techniques.
COMP 413