Butler Lampson
There
are lots of protocols for establishing connections (or equivalently, doing
at-most-once message delivery) across a network that can delay, reorder,
duplicate and lose packets. Most of the
popular ones are based on three-way handshake, but some use clocks or extra
stable storage operations to reduce the number of messages required. It’s hard to understand why the protocols
work, and there are almost no correctness proofs; even careful specifications are
rare.
I
will give a specification for at-most-once message delivery, an informal
account of the main problems an implementation must solve and the common
features that most implementations share, and outlines of proofs for three
implementations. The specifications and
proofs based on Lamport’s methods for using abstraction functions to understand
concurrent systems, and I will say something about how his methods can be
applied to many other problems of practical interest.
Butler Lampson
October 30, 1995
This is joint work with Nancy Lynch.
The errors in this talk are mine, however.
Specify at-most-once message delivery.
Describe other features we want from an implementation
Give a framework for thinking about implementations.
Show how to prove correctness of an implementation.
Network Connections
or
Reliable At-Most-Once Messages
Messages are delivered in fifo order.
A message is not delivered more than once.
A message is acked only if delivered.
A message or ack is lost only if it is being sent between crash and recovery.
“Everything
should be made as simple as possible, but no simpler.”
A. Einstein
Make progress: regardless of crashes,
if both ends stay up a waiting
message is sent, and
otherwise both parties become idle.
Idle at no cost: an idle agent
has no state that changes for each
message, and
doesn’t send any packets.
Minimize stable storage operations — <<1 per message.
Use channels that are easy to implement:
They may lose, duplicate, or reorder
messages.
Some pragmatic issues we won’t discuss:
Retransmission policy.
Detecting failure of an attempt to
send or ack, by timing it out.
A system is defined by a safety and a liveness property:
Safety: nothing bad ever happens. Defined by a state machine:
A set of states. A state is a pair (external
state, internal state)
A set of initial states.
A set of transitions from one state
to another.
Liveness: something good eventually happens.
An action is a named set of transitions; actions partition the transitions.
For instance: put(m); get(m); crashs
A history is a possible sequence of actions, starting from an initial state.
The behavior of the system is the set of possible histories.
An external action is one in which the external state changes. Correspondingly there are external histories and behaviors.
An action is:
A name, possibly with parameters: put(“red”).
A guard, a predicate on the state which must be true for this action to be a possible transition: q ≠ < > and i > 3.
An effect, changes in some of the state variables: i := i + 1.
The entire action is atomic.
Example:
get(m): m first on q take first from q, if q now empty
and
status = ? then status := true
name
guard effect
State: q : sequence[M] := < >
status : {true, false, ?} := true
recs/r : Boolean :=
false
“Sender Actions” |
“Receiver Actions” |
||||
Name |
Guard |
Effect |
Name |
Guard |
Effect |
put(m)** |
|
if ~recs then |
get(m)* |
~recr,
m first on q |
take first from q, |
getAck(b)* |
~recs , status = b |
none |
|
|
|
crashs** |
|
recs
:= true |
crashr** |
|
recr
:= true |
recovers* |
recs |
recs
:= false |
recoverr* |
recr |
recr
:= false |
lose |
recs or recr |
delete any element |
|
|
|
Action |
q |
status |
Action |
q |
status |
Action |
q |
status |
Initially |
< > |
true |
|
|
|
|
|
|
put(“red”) |
<“red”> |
? |
|
|
|
|
|
|
get(“red”) |
< > |
true |
|
|
|
|
|
|
getAck(true) |
< > |
true |
|
|
|
|
|
|
put(“green”) |
<“green”> |
? |
|
|
|
|
|
|
crashr, lose |
<“green”> |
? |
crashr, lose |
<“green”> |
false |
crashr, lose |
< > |
false |
|
|
? |
getAck(false) |
<“green”> |
false |
getAck(false) |
< > |
false |
put(“blue”) |
<“green”, |
? |
put(“blue”) |
<“green”, |
? |
put(“blue”) |
<“blue”> |
? |
get(“green”) |
<“blue”> |
? |
get(“green”) |
<“blue”> |
? |
|
|
|
get(“blue”) |
< > |
true |
|
|
|
|
|
|
getAck(true) |
< > |
true |
|
|
|
|
|
|
State sr :
multiset[P] := {} P = I ´ M rs :
multiset[P] := {}
or I ´ Bool
Name |
Guard |
Effect |
Name |
Guard |
Effect |
sndsr(p) |
|
sr := sr È {p} |
sndrs(p) |
|
rs := rs È {p} |
rcvsr(p) |
p Î sr |
sr := sr – {p} |
rcvrs(p) |
p Î rs |
rs := rs – {p} |
losesr |
$ p | p Î sr |
sr := sr – {p} |
losers |
$ p | p Î rs |
rs := rs – {p} |
State: modes : {acked,
send, rec} := acked moder : {idle,
rcvd, ack} := idle
goods : set[ I ] :=
I goodr : set[ I
] := I
lasts : I lastr : I
cur : M buf :
sequence[M] := <>
Name |
Guard |
Effect |
Name |
Guard |
Effect |
|
put(m)** sndsr(i, m)* |
mode = acked
|
cur := m, last := i, none |
rcvsr(i, m) |
|
if i Î good
then |
|
|
|
|
get(m)* |
mode
= rcvd, |
take first from buf; |
|
rcvrs(i, –) |
|
if mode = send and |
sndrs(i, true)* |
mode = ack, |
mode := idle |
|
getAck |
mode = acked |
none |
|
|
|
|
Divide actions into external (marked * or **) and internal (unmarked).
External actions change external
state, internal ones don’t.
An external history is a history (sequence of actions) with all the internal actions removed.
T implements S if
every external history of T is an external history of S, and
T’s liveness property implies S’s liveness property.
Suppose we have a function f from T’s state to S’s state such that:
f takes initial states to initial states;
f maps every transition of T to a sequence of transitions of S, perhaps empty (i.e., the identity on S);
f(t)
S or skip ® f(t')
t T ® t'
f maps every external action of T to a sequence containing the same external action of S and no other external actions.
f maps every internal action of T to a sequence of internal actions.
Then T implements S.
Why bother? Transitions are simpler than histories.
(1) goods Ç ( {lasts} È ids(sr) È ids(rs) ) = {}
(2) goodr Ç ids(rs) = {}
(3) goodr Ê goods
(4) ((i, m) Î
sr and i Î goodr) implies m = cur
q
= <cur> if modes = send and lasts Î goodr
< > otherwise
+ buf
status
= ?
if modes
= send
true otherwise
Simplify
the spec and the implementations.
Save clever encodings for later.
Make a “working spec” that’s easier to handle:
It implements the actual spec.
It has as much non-determinism as
possible.
All the prophecy is between it and
the actual spec.
actual¬ implements working¬ implements
implemen-
spec spec tation
Find the abstraction function. The rest is automatic.
Give names to important functions of your state variables.
To design an implementation, first invent the guards you
need,
then figure out how to implement them.
If you add a variable h to the state space such that
If s is an old initial state then there’s an h such that (s, h) is initial;
If (s, h) ® (s', h') then s ® s';
If s ® s' then for any h there’s an h' such that (s, h) ® (s', h')
then the new state machine has the same histories as the old one.
Suppose we add mode := acked to crashs.
Consider the sequence put(“red”), snd, crashs, put(“blue”), snd.
Now we have sr = {(1, “red”), (2, “blue”)}. We need an ordering on identifiers to order these packets and maintain fifo delivery. On rcvsr(i, m) the receiver must remove all identifiers ≤ i from goodr.
But now “red” is lost if (2, “blue”) is received first. If we use the obvious abstraction function
q = the m’s from {(i, m) Î sr È (lasts, cur)| i Î goodr} sorted by i,
this loss doesn’t happen between crashs and recovers, as allowed by the spec, but later at the rcv.
We give a working spec that makes this delay explicit.
State: q : sequence[(M, Flag)] := < >
Flag = {OK, ?}
status : ({true, false, ?}, Flag) := true
recs/r : Boolean :=
false
Name |
Guard |
Effect |
Name |
Guard |
Effect |
||
put(m)** |
|
if ~recs then |
get(m)* |
~recr,
|
take first from q, |
||
getAck(b)* |
~recs
, |
none |
|
|
|
||
mark |
recs or recr |
in some element of q or in status, set flag := ? |
unmark |
|
in some element of q or in status, set flag := OK |
||
drop |
true |
delete some element of q
with flag := ?, |
|
|
|
||
If you add a variable p to the state space such that
If s is an old initial state then there’s a p such that (s, p) is initial;
If (s, p) ® (s', p') then s ® s';
If s ® s' then for any h' there exists an h such that (s, h) ® (s', h')
If s is an old state, there’s a p such that (s, p) is a new state.
then the new state machine has the same histories as the old one.
If T implements S, you can always add history and prophecy variables to T and then find an abstraction function to S.
Extend Flag to include a lost component drawn from the set {OK, lost}.
The lost component prophesies whether drop will attack or not.
The abstraction function is
qS = the first components of elements of qDP that are not lost
statusS = the first component of statusDP if it is not lost, else false.
State: q : sequence[(M, Flag)] := < > Flag = ({OK, ?},
status : ({true, false, ?}, Flag) := (true, OK2) {OK, lost})
recs/r : Boolean :=
false
Name |
Guard |
Effect |
Name |
Guard |
Effect |
||
put(m)** |
|
if ~recs then |
get(m)* |
~recr,
|
take first from q, |
||
getAck(b)* |
~recs
, status = |
none |
|
|
|
||
mark |
recs
or recr, |
in some element of q or in status, set flag := (?, x). If last of q is lost, set status flag (?, lost) |
unmark |
|
in some element of q or
in status that isn’t lost, |
||
drop |
true |
delete some lost element of q
|
|
|
|
||
State: modes : {acked,
send, rec} := acked moder : {idle,
rcvd, ack} := idle
goods : set[ I ] :=
{} goodr : set[ I
] := {}
lasts : I lastr : I
cur : M buf :
sequence[M] := <>
used : set[I] :=
{} nacks :
sequence[I] := <>
ack : Boolean := false
Name |
Guard |
Effect |
Name |
Guard |
Effect |
|
put(m)** choose-id sndsr(i, m)* |
mode = acked mode = needid, as usual |
mode := needid, mode := send, |
rcvsr(i, m) |
|
if i Î good
then |
|
|
|
|
get(m)* |
as usual |
as usual |
|
rcvrs(i, b) |
|
if mode = send and |
sndrs(i,T) sndrs(i,F) |
mode = ack, i first on nacks |
mode := idle take first from nacks |
|
getAck(b)* |
mode = acked, |
none |
|
|
|
|
State: goods : set[ I
] := {} goodr : set[ I
] := {}
lasts : I lastr : I
useds : set[I] := {} issued :
set[I] := {}
Name |
Guard |
Effect |
Name |
Guard |
Effect |
shrink-gs(i) grow-gs(i) |
true i Î good,
|
take i from good add i to good |
shrink-gr(i) grow-gr(i) |
i Ï goods È
i Ï issued |
take i from good add i to good, issued |
|
|
|
cleanup |
mode
≠ rcvd, |
last := nil |
recovers |
rec |
mode := acked, |
recoverr |
rec |
mode := idle, |
msg(id) = {m
| (id = lasts and m=current ) or id Î
ids(sr)
or (id = lastr and m is last on buf)}
This defines a partial function msg: ID
® M.
current-q = <(current, OK)> if modes = send and lasts Î goodr
or modes = needid and
goods
≤ goodr
<(current, ?)> if modes = needid and
not goods
≤ goodr
<> otherwise
inflightsr = {id | id Î
ids(sr) and id Î goodr
and not (id = lasts
and modes
= send) },
sorted by id to make a sequence
inflight-m = the sequence of Ms gotten from msg of each I in inflightrs.
inflightrs = {lasts} if (ack, lasts, true) Î rs and not lasts = lastr
queue |
the elements of bufr paired with OK |
|
|
+ the elements of inflight-m paired with ? |
|
|
+ current-q |
|
status |
(false, OK) if modes = rec else (?, f) if current-q = <(m, f)> (?, OK) if modes = send, lasts = lastr and buf ≠ empty (true, OK) if modes = send, lasts = lastr and buf = empty (true, ?) if modes = send and lasts Î inflightrs (false,
OK) if modes = send (ack, OK) if modes = acked |
Generic |
Five-packet handshake |
Liskov-Shrira-Wroclawski |
goods |
{id | (accept, jds,
id) Î rs} if mode = needid |
{times} – {lasts} |
goodr |
{lastr} if moder = accept |
{i | lower
< i |
lastr |
lastr if mode ≠ accept |
lastr
|
shrink-goods |
mode = needid and losers(accept, jds,
id) of last copy |
tick(id), id = times |
grow-goods |
sendrs(accept, jds, id) |
tick(id) |
cleanup |
receivesr(acked), mode = ack |
cleanup |
shrink-goodr |
mode = accept and receivesr(acked, idr, nil) |
increase-lower(id), |
grow-goodr |
mode = idle and receivesr(needid, ...), |
increase-upper(id), |
Client specification of at-most-once messages.
Working spec which maximizes non-determinism.
Generic implementation with good identifiers maintained by magic.
Two practical implementations
Five-packet handshake
Liskov-Shrira-Wrowclawski
You can find these slides on research.microsoft.com/lampson, as well as a paper (item 47 in the list of publications).
L. Lamport, A simple approach to specifying concurrent systems, Comm. acm 32, 1, Jan. 1989.
M. Abadi and L. Lamport, The existence of refinement mappings, DEC SRC research report 29, Aug. 1988.