1 What It Does

1.1 The Specification


1.2 Checking It


2 The Protocol

2.1 The Algorithm


2.2 Checking It


2.3 Implementation

Prev:  Intermezzo 2

Next:  Session 11 part 2

Contents
 ↑

top

 ↑

top

 ↑

top

 ↑

top

 ↑

top

 ↑

top

 ↑

top

 ↑

top

 ↑

top

 ↑

top

 ↑

top

 ↑

top

 ↑

top

 ↑

top

 ↑

top

 ↑

top

 ↑

top

 ↑

top

 ↑

top

Session 11   The Alternating Bit Protocol
Part 1.  Safety

Leslie Lamport

Last modified on 25 January 2023


1  What it Does

The Alternating Bit (AB) protocol is an algorithm for sending data from one process to another in messages that can be lost.  Before writing an algorithm, we should understand what the algorithm is supposed to achieve.  For an algorithm that just computes a value and terminates, like the one in Session 10, correctness is expressed by a postcondition—a state predicate that must be true when the algorithm has terminated.  For an algorithm that can run forever, a good way to specify it is with a higher-level, more abstract algorithm—one that describes what the algorithm should accomplish but not how it does it.

What the AB protocol accomplishes is explained in a video.  To watch it, click on the following link and then, on that page, click on What the Protocol Should Accomplish.  The next 2 minutes 13 seconds of the video describe what the protocol is supposed to do.  The Following Link

The video explains that a sender process A sends a sequence of items to a receiver process B.  I will call those items messages.  Each message is a pair consisting of a string and a one-bit value, where the one-bit value alternates between 0 and 1.  Process A sending the message is represented by setting the value of variable AVar to the message; and its receipt by process B is represented by setting the value of variable BVar to the value of AVar.

1.1  The Specification

The precise description of what the AB protocol does is algorithm ABSpec in module Session11a, which you should open in the Toolbox.  Instead of sending strings, it sends elements of an unspecified set Data.  Just to give you a little practice using records, instead of sending ordered pairs, the algorithm sends records with a data field and a bit field. 

Module Session11a first EXTENDS the Integers module.  It then declares the constant set Data and the set Msgs of all messages that can be sent as follows: 

CONSTANT Data
Msgs  ==  [data: Data, bit : {0,1}]
The algorithm has two processes A and B, whose ids are "A" and "B".  We ignore liveness for now and consider only safety.  The algorithm therefore has this structure:

--algorithm ABSpec {
    variables ... ;
    process (A = "A") { ... }
    process (B = "B") { ... }
  }
There are two (global) variables AVar and BVar that have the same initial value, indicating that process B is waiting for process A to send it a value.  That value can be any element of Msgs, so the global variable declaration is:

variables  AVar \in Msgs,  BVar = AVar ;

Exercise 1   The video says that the initial value of AVar and BVar should have its bit equal to 1.  How can we modify the variable declaration to assert that?  Show the answer.

Answer   We just have to change the declaration of AVar.  Here are three equivalent ways to do that:

AVar \in [data \in Data, bit \in {1}]
AVar \in {m \in Msgs : m.bit = 1}
AVar \in {[data |-> d, bit |-> 1] : d \in Data}

Process A can send a message to B when AVar.bit equals BVar.bit, which it does by setting AVar to that message.  The message can have any data value, but the bit field must be the complement of the current value of AVar.bit.  It can keep sending messages, so the operation of sending a message appears in a while (TRUE) loop. The body of process A is:

a: while (TRUE) {
     await AVar.bit = BVar.bit ;
     with (d \in Data) { AVar := [data |-> d, bit |-> 1 - AVar.bit] } 
   } 
Since it is executed only when AVar is a record with just data and bit components, the assignment to AVar (in the body of the with statement) can also be written as follows using PlusCal's multiple assignment statement:

AVar.data := d  ||  AVar.bit := 1 - AVar.bit
Process B can receive a message from A when AVar.bit does not equal BVar.bit, which it does by setting BVar to AVar.  The body of process B is:

b: while (TRUE) { 
     await AVar.bit /= BVar.bit ;
     BVar := AVar                             
   } 

1.2  Checking the Specification

Let's now check algorithm ABSpec.  Creating a model for the algorithm requires specifying the set to be substituted for the constant Data.  This is such a simple algorithm that it could check in a few seconds a model in which Data has 1000 elements.  However, a set of three values is large enough to catch any likely error.  Let's use a model that substitutes the set {"dog", "cat", "rat"} for the constant Data.  Run the translator and then run TLC on the model to make sure that there's no obvious error. 

The next thing to check is type correctness.  The module contains this type invariant:

ABSpec_TypeOK  ==  (AVar \in Msgs) /\ (BVar \in Msgs)
(You'll see below why it's not named simply TypeOK.)  Use the model to check that ABSpec_TypeOK is an invariant of the algorithm.

The only interesting invariant that I can think of to check is that if AVar and BVar have the same bit component, then they are equal.  It is defined in the module by:

Inv  ==  (AVar.bit = BVar.bit) => (AVar = BVar)
Check that Inv is an invariant of the algorithm.

2  The Protocol

The AB protocol is a well-known algorithm for sending information over a lossy channel.  The following is a description of it taken from a Wikipedia page on 19 October 2021.  I've modified the terminology and shortened the description a bit. 

The Alternating Bit protocol is a simple network protocol that retransmits lost or corrupted messages using FIFO (first-in, first-out) semantics.  Messages are sent from sender A to receiver B.  Assume that the channel from A to B is initialized and that there are no messages in transit.  Each message from A to B contains a data part and a one-bit sequence number, which is a value that is 0 or 1.  B has two acknowledge codes that it can send to A: the values 0 and 1. 

When A sends a message, it resends it continuously, with the same sequence number, until it receives an acknowledgment from B that contains the same sequence number.  When that happens, A complements (flips) the sequence number and starts transmitting the next message. 

When B receives a message that is not corrupted and has sequence number 0, it starts sending 0 and keeps doing so until it receives a valid message with number 1.  Then it starts sending 1, etc. 

This means that A may still receive 0 when it is already transmitting messages with sequence number 1.  (And vice versa.)  It treats such messages as negative-acknowledge codes.  The simplest thing for it to do is to ignore them all and continue transmitting.

This algorithm is an abstraction.  A practical implementation would include some method to detect corrupted messages—for example, by including a checksum with the message.  A corrupted message will be thrown away, logically transforming it into a lost message.  An implementor must decide when to resend a message or an acknowledgment.  Resending too often wastes bandwidth; not sending often enough wastes time, increasing the delay caused by message loss. 

An algorithm is an abstract model of a program or system.  Building any model involves a compromise between accuracy and simplicity.  For this tutorial, I have favored simplicity to keep from obscuring the basic ideas of how to write and check algorithms with details of the particular algorithms. 

2.1  The Algorithm

The PlusCal description of the AB protocol is algorithm AB in module Session11b.  In addition to the variables AVar and BVar, it has variables AtoB and BtoA that represent the messages in transit from A to B and from B to A, respectively. 

We represent message passing the same way as in algorithm FindRoutes of Session 10.  The messages in transit from one process to another are a sequence seq of messages, where a message is sent by appending it to the end of seq and received by removing it from the front of seq.  This describes FIFO message delivery.  The value of AtoB will be a sequence of elements of Msgs and the value of BtoA will be a sequence of ones and zeros.  Initially AtoB and BtoA both equal the empty sequence << >>.  Module Session11b EXTENDS the Sequences module to import the definitions of the operations on sequences.

In addition to the processes A and B, algorithm AB has a process LoseMsgs, with id "L".  This process performs the actions of losing messages—that is, removing them from the sequences AtoB and BtoA of messages.

Process A can perform two kinds of actions: sending a message to B by appending it to the end of the sequence AtoB, and receiving a message from B by removing it from the head of BtoA.  Its body has the following form:

a: while (TRUE) { 
            either { Send AVar to B }
            or     { Receive message from B } 
          } 
The statement

either { S1 }  or { S2 }  ...  or { SK }
is executed by executing one of the statements S1 , S, nondeterministically choosing which to execute.  (Since TLC examines all possible executions, it will check all possibilities.)  In process A, the either clause sends the value of AVar to B by appending it to the end of AtoB, so it is:

AtoB := Append(AtoB, AVar)
The or clause receives the message from B at the head of the sequence BtoA, which it can do only if BtoA is non-empty.  If that message equals AVar.bit, then process A chooses a new value of AVar, just as it did in algorithm ABSpec.  Regardless of the message's value, A removes the message from the head of BtoA.  The code of the or clause of A is therefore:

await BtoA /= << >> ;
if (Head(BtoA) = AVar.bit) 
  { with (d \in Data) { AVar := [data |-> d, bit |-> 1 - AVar.bit] } 
  };
BtoA := Tail(BtoA)                                

The body of process B is similar, except

  • The either clause sends BVar.bit to A by appending it to BtoA.
  • The or clause receives a message from A and, if the message's bit component doesn't equal BVar.bit, then it sets BVar equal to the message.
The complete body of process B is:

b: while (TRUE) {
     either { BtoA := Append(BtoA, BVar.bit) }
     or     { await AtoB /= << >> ;
              if (Head(AtoB).bit /= BVar.bit) { BVar := Head(AtoB) };
              AtoB := Tail(AtoB)   
            }  
   } 

Finally, we have the LoseMsgs process that repeatedly removes a nondeterministically chosen message from either AtoB or BtoA.  It also uses an either/or statement, the either clause removing an element from AtoB and the or clause removing an element from BtoA.  It removes an element from a sequence of messages using the RemoveElt operator, which you defined in this exercise of Intermezzo 1. The definition of RemoveElt appears in the module before the comment containing the algorithm.  Here is the body of the process:

 c: while (TRUE) {
     either with (i \in 1..Len(AtoB)) { AtoB := RemoveElt(i, AtoB) }
     or     with (i \in 1..Len(BtoA)) { BtoA := RemoveElt(i, BtoA) } 
   } 
Open module Session11b and make sure you understand the module up to and including the algorithm.  Run the translator, which should find no error.

2.2  Checking the Algorithm

Let's now check the algorithm with TLC.  If this were not such a simple algorithm, we would start with the smallest possible model—one in which Data has only a single element.  But since it is so small, create a model with Data the same set of three values as in our model for ABSpec.  Run TLC on the model.

If the module has not been corrupted, so the algorithm actually is correct, it should still be running.  It should be finding more and more reachable states.  When such a simple algorithm keeps running for a long time, it usually means that it has infinitely many reachable states and will run forever.  So, stop TLC.  We need to ask ourselves, what variables can assume infinitely many values?

The variables AVar and BVar can assume only a small number of values.  However, the values of AtoB and BtoA are sequences.  An implementation of the protocol will try to avoid sending unnecessary messages, so a message queue should seldom contain more than one message.  But the algorithm allows executions in which the processes keep send messages faster than they are received, so the sequences AtoB and BtoA can keep getting longer and longer.

To model check the algorithm, we have to tell TLC to examine only executions in which those sequences don't get too big.  In the Toolbox, go to the Model Overview sub-tab of the Model and click on Additional Spec Options.  (The link is right above the What is the model? section of the Model Overview window.)  In the State Constraint section of the Spec Options page, you can enter an arbitrary state predicate.  Enter this state predicate, which asserts that each of the two message queues has at most three elements:

(Len(AtoB) =< 3) /\ (Len(BtoA) =< 3)
Now run TLC on the model.  TLC will report that it found 960 distinct states.  Replace 3 in the state constraint by larger values and see how that changes how long TLC runs and how many reachable states it finds.  Also see what happens when you change the number of elements in Data.

If a model specifies a state constraint, then when TLC finds a state not satisfying the constraint it does not look for other states reachable from that state.  But it does consider that state to be reachable, and checks that it satisfies any invariants specified by the model.  (Thus, TLC will not consider the state constraint to be an invariant unless it doesn't rule out any reachable states.)

Now that we have a model that TLC can execute, how do we use it to check that the algorithm is correct?  We're not considering liveness yet, so the obvious thing to do is check that the algorithm satisfies the invariants we expect it to satisfy.  The first invariant to check is type correctness.  Module Session11b defines the type invariant AB_TypeOK.  Restore the model to one that TLC can execute quickly and run it to check that AB_TypeOK is an invariant.  Before reading further, write all the invariants you can think of that the algorithm should satisfy.

Here is a simple invariant of the algorithm that I thought of: If the first message in BtoA equals AVar.bit, then AVar and BVar are equal.  Formula Inv of module Session11b is a precise statement of this invariants.  Have TLC check that Inv is an invariant of the algorithm. 

The algorithm allows the message queues AtoB and BtoA to be arbitrarily long.  But the values of those queues always have a special form.  Let a single-value sequence be a sequence all of whose elements are equal (so the empty sequence is a single-value sequence).  The values of AtoB and BtoA always have the form s1 \o s2, where s1 and s2 are single-value sequences.  For example, the value of BtoA might be <<0,0,0,1,1>> or <<0,0>>

Exercise 2 Write an invariant of the algorithm that asserts AtoB and BtoA are always the concatenation of two single-value sequences and that describes the elements of those sequences in terms of AVar and BVar Show hint.

Hint   Here are two ways to write the assertion that a sequence seq is the concatenation of two single-value sequences.  The first has the form:

\E i \in 0..Len(seq) : 
   \A j \in 1..Len(seq) : seq[j] = IF j =< i THEN ... ELSE ...
Another way is to first define Repeat(v, n) to be the sequence of length n all of whose elements equal v (so Repeat(v, 0) is the empty sequence for any v).  The assertion can then be written

\E i \in 0..Len(seq) : seq = Repeat(..., i) \o Repeat(..., Len(seq)-i)
Using TLC to check your attempts, you should be able to figure out what the value of each  ...  should be. 

You should check all the invariants you can think of—including ones you're not sure actually are invariants.  Even if the algorithm is correct, finding an execution that violates what you thought was an invariant can be very useful.  It teaches you that you didn't completely understand the algorithm.  Moreover, creating tests that produce the execution is an effective way to find bugs in an implementation of the algorithm. 

2.3  Implementation

Simple Implementation

Checking algorithm AB by checking invariants it should satisfy is nice.  However, we have a specification of what the algorithm should do—namely, algorithm ABSpec.  We would like to check that AB implements its specification ABSpec

The conventional view is that algorithm AB does not implement algorithm ABSpec.  An algorithm is viewed as a machine for generating executions.  In algorithm ABSpec, process A sends a value to process B in a single step, while in algorithm AB it takes at least one additional step.  Therefore AB and ABSpec are viewed as two different, incomparable algorithms.

The conventional view comes from thinking of algorithm ABSpec as specifying a universe described by the values of the two variables AVar and BVar, while AB specifies a universe described by the values of the four variables AVar, BVar, AtoB, and BtoA.  Since these are two different universes, statements about one are not statements about another. 

To see how unnecessarily limiting the conventional view is, observe that formula ABSpec_TypeOK is an invariant of algorithm AB as well as of algorithm ABSpec.  We can just copy its definition from module Session11a to module Session11b and have TLC check that it's an invariant of AB.  (I used different names for the type invariant in the two modules to save us from having to change its name when putting it in the other module.)  If ABSpec and AB are incomparable algorithms, how can they have the same invariant?

Formula ABSpec_TypeOK is not about a universe containing only the variables AVar and BVar.  It's a state predicate in any universe containing those two variables, including a universe that also contains the variables AtoB and BtoA.  In the same way, we should consider algorithm ABSpec to say something about a universe that contains the variables AtoB and BtoA and perhaps other variables as well.  To do that, we need to change the way we view algorithms.

Our method of representing an execution of an algorithm as a sequence of states, which we call a behavior, is quite conventional.  In the conventional view, an algorithm generates behaviors.  Since algorithm ABSpec generates values only for the variables AVar and BVar, it is then natural to think that the algorithm is specifying behaviors containing only those two values.  However, instead of thinking of ABSpec as a generator of behaviors, we will view it as a rule for deciding whether a behavior satisfies the algorithm.  In other words, we think of ABSpec as a predicate on behaviors—that is, a formula that is true or false of a behavior.  It is true of a behavior iff the values of AVar and BVar on the states in that sequence describe a possible execution of the algorithm.  We use the term property to mean a predicate on behaviors.  You should think of the PlusCal algorithm ABSpec as a property.  It is a mathematical formula, written in an unmathematical language. 

TLA+ is based on temporal logic, which means it is a language for writing properties.  For example, the property asserting that ABSpec_TypeOK is true on all states of a behavior is written in TLA+ as []ABSpec_TypeOK Thus, ABSpec_TypeOK is an invariant of ABSpec means that every behavior satisfying the property ABSpec also satisfies the property []ABSpec_TypeOK, which is equivalent to the assertion that property ABSpec implies property []ABSpec_TypeOK.  Satisfying a property is implication. 

The formula written in PlusCal as algorithm ABSpec is written in TLA+ as the formula Spec defined by the algorithm's translation.  When writing an algorithm in PlusCal, you don't care how Spec is defined in TLA+.  (For debugging the algorithm, it helps to understand a little about the definition.)  But you should know that satisfying a property is implication.  That ABSpec_TypeOK is an invariant of ABSpec is asserted by the truth of the formula Spec => []ABSpec_TypeOK.  You can tell TLC to check that this formula is true by adding []AB_TypeOK to the list of properties to be checked (in the Properties subfield of the What to check? field of the model's Model Overview tab).  This is equivalent to adding AB_TypeOK to the list of invariants to be checked.

We also want AB implements ABSpec to mean that every behavior satisfying property AB also satisfies property ABSpec, so implements also means implies.  A little thought shows that making this true just requires ABSpec to allow behaviors to have steps that do not change either of its variables.  A lot of thought led me to design TLA+ so that no property written in it can either forbid or require steps that change none of the property's variables.  Thus, no PlusCal algorithm can do that, since the meaning of the algorithm is (the property specified by) its TLA+ translation.

To check that algorithm AB implements algorithm ABSpec, we have TLC check that AB implements the property ABSpec, which is formula Spec of module Session11a.  We could do this by copying the definition of Spec in module Session11a—along with the other definitions in terms of which Spec is defined—into module Session11b, changing the name Spec and other names that clash with ones defined in Session11b.  TLA+ provides an easier way to do that. The TLA+ statement

Foo == INSTANCE Session11a 
imports into the current module all the definitions from module Session11a, except with their names changed by prepending Foo!.  (I used the identifier Foo just for fun; we should use a more helpful name like ABSpec.)  Thus, formula ABSpec_TypeOK of module Session11a is imported into the current module as formula Foo!ABSpec_TypeOK, and formula Spec of that module is imported as formula Foo!Spec.  Add the INSTANCE statement to module Session11b after the algorithm's translation, and add Foo!Spec to the list of properties the model should check.  Now run the model.  TLC should report no error. 

Algorithm/property AB satisfies/implies property Foo!Spec, which is algorithm/property ABSpec written in TLA+.  Algorithm ABSpec satisfies/implies []ABSpec_TypeOK.  Since implication is transitive, we can deduce that AB satisfies/implies []ABSpec_TypeOK—in other words, ABSpec_TypeOK is an invariant of AB.  Since ABSpec_TypeOK is imported into module Session11b as Foo!ABSpec_TypeOK, you can check this by having TLC check that Foo!ABSpec_TypeOK is an invariant of AB.

Implementation with Data Refinement

We have just seen the simplest form of implementation, where one algorithm implies another.  A more complicated form of implementation involves data refinement.  Module Session11c contains a new version of the AB protocol called algorithm AB2.  It was obtained by replacing variables AVar and BVar by variables AVar2 and BVar2, whose values are ordered pairs rather than records.  The pair <<d, b>> corresponds to the record [data |-> d, bit |-> b] in algorithms AB and ABSpec

Algorithm AB2 implements ABSpec with a data refinement in which variable AVar of ABSpec is implemented by the expression AVar_Impl, defined by:

AVar_Impl == [data |-> AVar2[1], bit |-> AVar2[2]]
and variable BVar is implemented by BVar_Impl, defined by:

BVar_Impl == [data |-> BVar2[1], bit |-> BVar2[2]]
This means that for any sequence seq of states satisfying AB2:
If we create a sequence seq_ABSpec of new states by replacing each state in seq with a state in which AVar and BVar equal the values of AVar_Impl and BVar_Impl, respectively, in that state of seq, then the sequence seq_ABSpec satisfies algorithm ABSpec.
Note that it makes no difference what values variables other than AVar and BVar have in the states of seq_ABSpec.

Make sure you understand what it means for AB2 to implement ABSpec with this data refinement.  Write down some (finite) sequence seq of states, each defined by the values of the four variables of AB2, that are generated by executing the PlusCal code of AB2.  Then convert seq to a sequence seq_ABSpec of states of ABSpec, each defined by the values of AVar and BVar, obtained by letting their values equal the values of AVar_Impl and BVar_Impl, respectively, in the corresponding states of seq_ABSpec.  You will get a behavior that differs from what most people would call an execution of ABSpec only because it has extra steps in which states are repeated.  As we've seen, when we view an algorithm as a property rather than as a machine for generating states, that property allows those extra steps. 

I don't expect you to understand why the following statement is true: AB2 implements ABSpec with this data refinement iff AB2 satisfies the property obtained by substituting AVar_Impl for AVar and BVar_Impl for BVar in the TLA+ formula that represents algorithm ABSpec.  (That formula is formula Spec of module Session11a.)  Because it is true, we can use TLC to check that AB2 implements ABSpec with this data refinement.  The statement

Foo == INSTANCE Session11a WITH AVar <- AVar_Impl, BVar <- BVar_Impl
imports into the current module all the definitions from module Session11a, renamed as before, except with the expressions AVar_Impl and BVar_Impl of the current module substituted for the variables AVar and BVar of module Session11a.  In particular, in the current module it defines Foo!Spec to be formula Spec of module Session11a (the TLA+ representation of algorithm ABSpec), with those substitutions.  Therefore, algorithm AB2 implements algorithm ABSpec iff algorithm AB satisfies the property Foo!Spec.  Module Session11c contains the definitions of AVar_Impl and BVar_Impl and this INSTANCE statement.  Have TLC check that algorithm AB2 satisfies property Foo!Spec

There is no need to define AVar_Impl and BVar_Impl.  We could eliminate them and write the INSTANCE statement as:

Foo == INSTANCE Session11a                                         
           WITH AVar <- [data |-> AVar2[1], bit |-> AVar2[2]], 
                BVar <- [data |-> BVar2[1], bit |-> BVar2[2]]
I defined them because we needed names for those expressions in the explanation of implementation with data refinement.  It would have made the explanation of what we were doing incomprehensible, but we could have used the names AVar and BVar instead of AVar_Impl and BVar_Impl in module Session11c.  The INSTANCE statement would then have become:

Foo == INSTANCE Session11a WITH AVar <- AVar, BVar <- BVar
which can be abbreviated as

Foo == INSTANCE Session11a 
An INSTANCE statement specifies what to substitute for each declared variable and constant.  If an explicit substitution is omitted, an expression with the same name is substituted for that variable or constant.  Because the INSTANCE statement in module Session11b had no WITH clause, the variables AVar and BVar declared in module Session11b (in the translation of the PlusCal algorithm) were substituted for the variables of the same name in module Session11a.  In the INSTANCE statements in modules Sessiion11b and Session11c, the constant Data declared in those modules was substituted for the constant Data declared in module Session11a.

The implementation of ABSpec by AB2 is a toy example.  There is no reason to use a data refinement that represents a record having two fields by an ordered pair.  In practice, data refinement represents a higher-level mathematical data object by a lower-level one that is closer to the way it will be implemented in a program.  In sequential algorithms, a set might be represented by a B-tree.  In distributed algorithms, a queue of messages might be represented by sequentially numbered messages in transit throughout a network. 

I have only sketched the basic idea of implementation and data refinement.  There is more to learn about how to write and verify higher-level specifications of algorithms, but it may be best learned in terms of TLA+ rather than PlusCal. 

Programs

An algorithm you write will probably be implemented by a program, or perhaps a collection of programs in a system.  Understanding what it means for one algorithm to implement another enables you to understand what it means for a program to implement an algorithm. 

A program as well as an algorithm can be viewed as a property.  PlusCal was designed to make the corresponding TLA+ property simple.  Programming languages are designed to make programs efficiently executable on computers.  The property corresponding to a program written in a practical programming language is complicated, whether it's written in TLA+ or any other language for writing mathematical formulas.  However, you should have an intuitive idea of what constitutes the state of a program.  The most obvious part of the state is the collection of variables and their values, and the objects pointed to by the variables.

There are not yet industrial-strength tools for verifying that a program implements an algorithm written in PlusCal or TLA+.  But you should understand approximately how the values of the variables of an algorithm are represented by the state of a program that implements it.  This understanding will help you design and test the program.  For example, it will allow you to add assertions to the program to test that, when algorithm variables are replaced by the program expressions that represent them, an invariant of the algorithm is satisfied by the program.