1 What It Does
1.2 Checking It 2 The Protocol 2.1 The Algorithm 2.2 Checking It 2.3 Implementation Prev: Intermezzo 2 Next: Session 11 part 2 Contents |
1 What it DoesThe Alternating Bit (AB) protocol is an algorithm for sending data from one process to another in messages that can be lost. Before writing an algorithm, we should understand what the algorithm is supposed to achieve. For an algorithm that just computes a value and terminates, like the one in Session 10, correctness is expressed by a postcondition—a state predicate that must be true when the algorithm has terminated. For an algorithm that can run forever, a good way to specify it is with a higher-level, more abstract algorithm—one that describes what the algorithm should accomplish but not how it does it. What the AB protocol accomplishes is explained in a video. To watch it, click on the following link and then, on that page, click on What the Protocol Should Accomplish. The next 2 minutes 13 seconds of the video describe what the protocol is supposed to do. The Following Link
The video explains that a sender process A sends a sequence
of items to a receiver process B. I will call those items
messages.
Each message is a pair
consisting of a string and a one-bit value, where the one-bit value
alternates between 0 and 1. Process A sending the message is
represented by setting the value of variable 1.1 The Specification
The precise description of what the AB protocol does is algorithm
Module
CONSTANT Data Msgs == [data: Data, bit : {0,1}]The algorithm has two processes A and B , whose ids are
"A" and "B" .
We ignore liveness for now and consider only safety. The algorithm
therefore has this structure:
--algorithm ABSpec { variables ... ; process (A = "A") { ... } process (B = "B") { ... } }There are two (global) variables AVar and
BVar that have the same initial value, indicating that
process B is waiting for process A to send it a
value. That value can be any element of Msgs , so
the global variable declaration is:
variables AVar \in Msgs, BVar = AVar ;
Exercise 1 The video says that the initial value
of
Answer We just have to change the declaration of
AVar \in [data \in Data, bit \in {1}] AVar \in {m \in Msgs : m.bit = 1} AVar \in {[data |-> d, bit |-> 1] : d \in Data} A can send a message to B when
AVar.bit equals BVar.bit , which it does by
setting AVar to that message. The message can have
any data value, but the bit field must be the complement
of the current value of AVar.bit . It can keep
sending messages, so the operation of sending a message appears in a
while (TRUE) loop. The body of process A
is:
a: while (TRUE) { await AVar.bit = BVar.bit ; with (d \in Data) { AVar := [data |-> d, bit |-> 1 - AVar.bit] } }Since it is executed only when AVar is a record with just
data and bit components, the assignment to
AVar (in the body of the with statement) can
also be written as follows using PlusCal's
multiple assignment statement:
AVar.data := d || AVar.bit := 1 - AVar.bitProcess B can receive a message from A when
AVar.bit does not equal BVar.bit , which it
does by setting BVar to AVar . The
body of process B is:
b: while (TRUE) { await AVar.bit /= BVar.bit ; BVar := AVar } 1.2 Checking the Specification Let's now check algorithm
ABSpec_TypeOK == (AVar \in Msgs) /\ (BVar \in Msgs)(You'll see below why it's not named simply TypeOK .)
Use the model to check that ABSpec_TypeOK is an invariant
of the algorithm.
The only interesting invariant that I can think of to check is
that if
Inv == (AVar.bit = BVar.bit) => (AVar = BVar)Check that Inv is an
invariant of the algorithm.
2 The ProtocolThe AB protocol is a well-known algorithm for sending information over a lossy channel. The following is a description of it taken from a Wikipedia page on 19 October 2021. I've modified the terminology and shortened the description a bit. This algorithm is an abstraction. A practical implementation would include some method to detect corrupted messages—for example, by including a checksum with the message. A corrupted message will be thrown away, logically transforming it into a lost message. An implementor must decide when to resend a message or an acknowledgment. Resending too often wastes bandwidth; not sending often enough wastes time, increasing the delay caused by message loss. An algorithm is an abstract model of a program or system. Building any model involves a compromise between accuracy and simplicity. For this tutorial, I have favored simplicity to keep from obscuring the basic ideas of how to write and check algorithms with details of the particular algorithms.
2.1 The Algorithm
The PlusCal description of the AB protocol is algorithm
We represent message passing the same way as in algorithm
In addition to the processes
Process
a: while (TRUE) {
either { Send
The statement
either { S1 } or { S2 } ... or { SK }is executed by executing one of the statements S1, … , Sk , nondeterministically choosing which to execute. (Since TLC examines all possible executions, it will check all possibilities.) In process A , the
either clause sends the value of AVar to
B by appending it to the end of AtoB , so it is:
AtoB := Append(AtoB, AVar)The or clause receives the message from B at the
head of the sequence BtoA , which it can do only if
BtoA is non-empty. If that message equals
AVar.bit , then process A chooses a new value of
AVar , just as it did in algorithm
ABSpec . Regardless of the message's value, A
removes the message from the head of BtoA . The code
of the or clause of A is therefore:
await BtoA /= << >> ; if (Head(BtoA) = AVar.bit) { with (d \in Data) { AVar := [data |-> d, bit |-> 1 - AVar.bit] } }; BtoA := Tail(BtoA)
The body of process
B is:
b: while (TRUE) { either { BtoA := Append(BtoA, BVar.bit) } or { await AtoB /= << >> ; if (Head(AtoB).bit /= BVar.bit) { BVar := Head(AtoB) }; AtoB := Tail(AtoB) } }
Finally, we have the
c: while (TRUE) { either with (i \in 1..Len(AtoB)) { AtoB := RemoveElt(i, AtoB) } or with (i \in 1..Len(BtoA)) { BtoA := RemoveElt(i, BtoA) } }Open module Session11b and make sure you understand
the module up to and including the algorithm.
Run the translator, which should find no error.
2.2 Checking the Algorithm
Let's now check the algorithm with TLC.
If this were not such a simple algorithm, we would start with the
smallest possible model—one in which If the module has not been corrupted, so the algorithm actually is correct, it should still be running. It should be finding more and more reachable states. When such a simple algorithm keeps running for a long time, it usually means that it has infinitely many reachable states and will run forever. So, stop TLC. We need to ask ourselves, what variables can assume infinitely many values?
The variables To model check the algorithm, we have to tell TLC to examine only executions in which those sequences don't get too big. In the Toolbox, go to the Model Overview sub-tab of the Model and click on Additional Spec Options. (The link is right above the What is the model? section of the Model Overview window.) In the State Constraint section of the Spec Options page, you can enter an arbitrary state predicate. Enter this state predicate, which asserts that each of the two message queues has at most three elements:
(Len(AtoB) =< 3) /\ (Len(BtoA) =< 3)Now run TLC on the model. TLC will report that it found 960 distinct states. Replace 3 in the
state constraint by larger values and see how that changes how long
TLC runs and how many reachable states it finds. Also see
what happens when you change the number of elements in Data .
If a model specifies a state constraint, then when TLC finds a state not satisfying the constraint it does not look for other states reachable from that state. But it does consider that state to be reachable, and checks that it satisfies any invariants specified by the model. (Thus, TLC will not consider the state constraint to be an invariant unless it doesn't rule out any reachable states.)
Now that we have a model that TLC can execute, how do we use it to
check that the algorithm is correct? We're not considering
liveness yet, so the obvious thing to do is check that the algorithm
satisfies the invariants we expect it to satisfy.
The first invariant to check is type correctness.
Module
Here is a simple invariant of the algorithm that I thought of: If the
first message in
The algorithm allows the message queues
Exercise 2 Write an invariant of the algorithm that asserts
Hint Here are two ways to write the assertion
that a sequence
\E i \in 0..Len(seq) : \A j \in 1..Len(seq) : seq[j] = IF j =< i THEN ... ELSE ...Another way is to first define Repeat(v, n) to be
the sequence of length n all of whose elements equal
v (so Repeat(v, 0) is the empty
sequence for any v ). The assertion can then
be written
\E i \in 0..Len(seq) : seq = Repeat(..., i) \o Repeat(..., Len(seq)-i)Using TLC to check your attempts, you should be able to figure out what the value of each should be.
You should check all the invariants you can think of—including ones you're not sure actually are invariants. Even if the algorithm is correct, finding an execution that violates what you thought was an invariant can be very useful. It teaches you that you didn't completely understand the algorithm. Moreover, creating tests that produce the execution is an effective way to find bugs in an implementation of the algorithm.
2.3 ImplementationSimple Implementation
Checking algorithm
The conventional view is that algorithm
The conventional view comes from thinking of algorithm
To see how unnecessarily limiting the conventional view is, observe
that formula
Formula
Our method of representing an execution of an algorithm as a sequence
of states, which we call a behavior, is quite conventional.
In the conventional view, an algorithm generates behaviors.
Since algorithm
TLA+ is based on temporal logic, which means it is a language for
writing properties. For example, the property asserting that
The formula written in PlusCal
as algorithm
We also want
To check that algorithm
Foo == INSTANCE Session11aimports into the current module all the definitions from module Session11a , except with their names changed by prepending
Foo! . (I used the identifier Foo just
for fun; we should use a more helpful name like
ABSpec .)
Thus, formula ABSpec_TypeOK of module
Session11a is imported into the current module as
formula Foo!ABSpec_TypeOK , and formula Spec of that
module is imported as formula Foo!Spec .
Add the INSTANCE statement to module
Session11b after the algorithm's translation, and add
Foo!Spec to the list of properties the model should
check. Now run the model. TLC should report no
error.
Algorithm/property Implementation with Data RefinementWe have just seen the simplest form of implementation, where one algorithm implies another. A more complicated form of implementation involves data refinement. ModuleSession11c
contains a new version of the AB protocol
called algorithm AB2 . It was
obtained by replacing
variables AVar and BVar by
variables AVar2 and BVar2 ,
whose values are ordered pairs rather than records.
The pair
<< d, b>>
corresponds to the record
[data |-> d, bit |-> b] in algorithms AB
and ABSpec .
Algorithm
AVar_Impl == [data |-> AVar2[1], bit |-> AVar2[2]]and variable BVar is implemented by BVar_Impl , defined by:
BVar_Impl == [data |-> BVar2[1], bit |-> BVar2[2]]This means that for any sequence seq of states
satisfying AB2 :
If we create a sequenceNote that it makes no difference what values variables other than AVar and BVar have in the states of
seq_ABSpec .
Make sure you understand what it means for
I don't expect you to understand why the following statement is true:
Foo == INSTANCE Session11a WITH AVar <- AVar_Impl, BVar <- BVar_Implimports into the current module all the definitions from module Session11a , renamed as before, except with the
expressions AVar_Impl and BVar_Impl of the
current module substituted for the variables AVar and
BVar of module Session11a .
In particular, in the current module it defines Foo!Spec
to be formula Spec of module Session11a
(the TLA+ representation of algorithm ABSpec ),
with those substitutions.
Therefore, algorithm AB2 implements
algorithm ABSpec iff algorithm AB
satisfies the property Foo!Spec .
Module Session11c contains the definitions of
AVar_Impl and BVar_Impl and this
INSTANCE statement. Have TLC check that algorithm
AB2 satisfies property Foo!Spec .
There is no need to define
Foo == INSTANCE Session11a WITH AVar <- [data |-> AVar2[1], bit |-> AVar2[2]], BVar <- [data |-> BVar2[1], bit |-> BVar2[2]]I defined them because we needed names for those expressions in the explanation of implementation with data refinement. It would have made the explanation of what we were doing incomprehensible, but we could have used the names AVar
and BVar instead of AVar_Impl and
BVar_Impl in module Session11c .
The INSTANCE statement would then have become:
Foo == INSTANCE Session11a WITH AVar <- AVar, BVar <- BVarwhich can be abbreviated as
Foo == INSTANCE Session11aAn INSTANCE statement specifies what to substitute for
each declared variable and constant. If an explicit substitution
is omitted, an expression with the same name is substituted for that
variable or constant. Because the INSTANCE
statement in module Session11b had no WITH
clause, the variables AVar and BVar declared
in module Session11b (in the translation of the PlusCal
algorithm) were substituted for the variables of the same name in
module Session11a . In the INSTANCE
statements in modules Sessiion11b and
Session11c , the constant Data declared in
those modules was substituted for the constant Data
declared in module Session11a .
The implementation of I have only sketched the basic idea of implementation and data refinement. There is more to learn about how to write and verify higher-level specifications of algorithms, but it may be best learned in terms of TLA+ rather than PlusCal. ProgramsAn algorithm you write will probably be implemented by a program, or perhaps a collection of programs in a system. Understanding what it means for one algorithm to implement another enables you to understand what it means for a program to implement an algorithm. A program as well as an algorithm can be viewed as a property. PlusCal was designed to make the corresponding TLA+ property simple. Programming languages are designed to make programs efficiently executable on computers. The property corresponding to a program written in a practical programming language is complicated, whether it's written in TLA+ or any other language for writing mathematical formulas. However, you should have an intuitive idea of what constitutes the state of a program. The most obvious part of the state is the collection of variables and their values, and the objects pointed to by the variables. There are not yet industrial-strength tools for verifying that a program implements an algorithm written in PlusCal or TLA+. But you should understand approximately how the values of the variables of an algorithm are represented by the state of a program that implements it. This understanding will help you design and test the program. For example, it will allow you to add assertions to the program to test that, when algorithm variables are replaced by the program expressions that represent them, an invariant of the algorithm is satisfied by the program. back to top
|