1 The Problem

1.2 The Graph

2 The Algorithm

Prev:  Intermezzo 1

Next:  Intermezzo 2

Contents

# Session 10   A Distributed Algorithm

Leslie Lamport

## 1  The Problem

### 1.1  The Real Problem

We now consider an algorithm introduced in the 1970s for routing messages in a computer network.  The network consists of a collection of computers, each of which can send messages directly to some of the other computers, which we'll call its neighbors.  A message sent by a computer to a computer not a neighbor has to be relayed by other computers to its destination.

Let a path in the network be a sequence of computers, each of which (other than the first) is a neighbor of the preceding one.  A path is a possible route that a message can traverse when sent from the first to the last computer in the path.  Let the length of the path be the number of computer to computer transmissions taken by a message following that path, so it's one less than the number of computers in the path.  We consider a sequence consisting of a single computer to be a path of length zero from that computer to itself.  The problem designers of the network had was to find an algorithm for routing messages that ensured messages sent from one computer to another traveled along a shortest possible path.

Computers and transmission lines can fail and be repaired, changing the set of computers and/or the set of neighbors of a computer.  At any time, a computer can directly observe only who its neighbors are.  Message-routing information has to be recomputed when that happens.

When developing an algorithm like this one, it helps to start simple.  Instead of a dynamic algorithm that continually updates the routing information as the network changes, we consider a fixed network.  Initially, each computer knows who its neighbors are.  The algorithm should compute the routing information and stop.  A solution to that problem can guide us to a more general dynamic algorithm.

We'll simplify the problem still more.  Instead of computing routing information for sending messages from any computer to any other computer, we will compute that information for sending messages to one specific computer, which we call the root.  We assume that each computer knows who its neighbors are, but only the root knows it's the root.  Our algorithm will give each computer the information it needs to send a message to the root; it can easily be modified to distribute the identity of the root as well.

From our algorithm, it will be easy to solve the more general problem of routing messages from any computer to any other computer.  We just have to write an algorithm that executes in parallel multiple copies of the simple algorithm, each copy having a different computer as the root.

At this point, you're probably expecting to see some pictures illustrating the problem.  Pictures are very useful for helping us understand things.  But when you're faced with problems in the real world, there may not be anyone to draw pictures for you; you'll have to draw your own.  So, get a pencil and paper or whatever device you like to draw on, and draw a few pictures to help you visualize our problem.

### 1.2  The Graph

Computer networks are complicated, containing gazillions of components built out of silicon and other elements.  We have to abstract from all that stuff to a mathematical model of the network in which we can state and solve the problem.

As you probably know, the abstract model commonly used to describe a computer network is a graph.  We too will use that model.  The first thing we must decide is how to represent the graph mathematically.  How we do that depends on exactly what kind of graphs we need to consider.  A graph consists of a set of nodes connected by edges.  When I asked you to draw helpful pictures, you probably drew graphs.  Did any of your graphs contain the following?

1. Arrows on the edges indicating a single direction in which messages travel.
2. Two different edges connecting the same two nodes.
3. An edge from a node to itself.
4. Two nodes not connected by a path.
5. A node with no edges attached to it.
These are all things that graphs might contain.  To decide how to represent our graph, we have to decide which of them it can contain.  When you drew your graphs, did you consider all these possibilities?  If you didn't, let that be a warning against relying on pictures.  Pictures can be helpful, but to avoid errors you have to think mathematically, not just with pictures.

For reasons I will explain, we will assume that our graph does not contain any of the first four possibilities.  We rule out possibility 1 because in many networks, including the one for which the algorithm was intended, sending a message from one computer to its neighbor requires an exchange of messages.  One computer can send messages to another iff the other can send messages to it, so there is no need for edges in the graph to have a direction.

We can rule out possibility 2 because an edge between two nodes indicates only that they can send messages to one another; multiple edges between them serve no purpose.  Ruling out possibilities 1 and 2 allows us to represent an edge joining nodes m and n by the set {m,n} (which of course is the same as the set {n,m}).  We can rule out possibility 3 because, even if a computer does send messages to itself, there is no need to route those messages over the network.  Therefore, we can represent an edge by a set {m,n} where m and n are different nodes.

Having ruled out possibilities 1–3, we can put our algorithm in a module, declaring our sets of nodes and edges and the root node with:

   CONSTANT Nodes, Edges, root

The module should also contain an assumption asserting that root is a node and that every edge is a set of nodes containing exactly two elements.  If we define Has2Elts(S) to be true iff the set S contains exactly two elements, then this assumption can be written:

   ASSUME /\ root \in Nodes
/\ \A e \in Edges : (e \subseteq Nodes) /\ Has2Elts(e)


Exercise 1   Write a definition of the operator Has2Elts Show the answer.

Answer   Has2Elts(S)  ==  \E a, b \in S : (a /= b) /\ S = {a, b}

Here is a simpler way to define Has2Elts.  The standard TLA+ module FiniteSets defines the operator Cardinality so that Cardinality(S) equals the number of elements in S,  for any finite set S.  So, Has2Elts(S) equals Cardinality(S) = 2.

Possibility 4 means that some computers can't send messages to the root.  An algorithm can provide no information to those computers.  For simplicity, we ignore those computers by pretending they don't exist, which we do by ruling out possibility 4.  Ruling it out means writing an assumption asserting that every node is connected by a path to every other node.  Before we can write such an assumption, I need to explain a little math.

### 1.3  Recursive Definitions

Asserting the assumption that rules out possibility 4 requires a recursive definition.  If you've programmed in a functional programming language, you should be familiar with recursive definitions and can skip to Exercise 2.

Recursion, which is called induction by mathematicians, allows you to define mathematically anything you can compute with an algorithm.  The classic example is the definition of factorial, where N factorial, usually written N!, equals 1 * 2 * ... * (N-1) * N for N greater than 0.  It's customary to define 0! to equal 1, so N! is defined for all natural numbers.

How would we define N! mathematically?  The expression I wrote contains ..., which isn't mathematics.  A mathematician would say:

If N equals 0, then N! equals 1.  Otherwise, it equals N * (N-1)!.
This precisely specifies the value of N! for any natural number N.  It even tells how to compute it, without any ... or and so on.  Here is how we can express that mathematical definition in TLA+.

Since TLA+ and PlusCal don't let us write N!, we'll write it as factorial(N).  The mathematician's words can be expressed with this TLA+ definition:

   factorial(N)  ==  IF N = 0 THEN 1
ELSE N * factorial(N-1)

If we put this definition in a module, the TLA+ parser will complain that the factorial in the ELSE expression is undefined.  That's because factorial is undefined until after the definition.  TLA+ (unlike most programming languages) requires us to declare that this use of factorial before its definition is intentional by preceding it with:

   RECURSIVE factorial(_)

With this definition, TLC can compute factorial(N) for small enough values of N — values up to 12 on my computer.  How big a value can it handle on your computer?

The common form of a recursive definition of an operator Op is to define Op(x) on some set of values by defining its value explicitly if x is a smallest element of the set, and defining its value in terms of the value of Op when applied to one or more values smaller than x.  For example, we define factorial(N) for any natural number N by explicitly defining it when N is 0, the smallest natural number, and defining it in terms of factorial(N-1) if N-1 is any other natural number.  The definition doesn't state that N has to be a natural number, but the semantics of TLA+ does not specify the value of factorial(N) if N is not a natural number, in which case TLC will report an error when evaluating it.

This kind of recursive definition is used to define Op(seq) for seq in some set of sequences.  We define Op(<< >>) explicitly, and we define Op(seq) in terms of Head(seq) and Op(Tail(seq)) if seq is a non-empty sequence.  Here, smaller than means shorter than, Tail(seq) being shorter than seq.

Exercise 2   Define an operator SumSeq so that for any sequence seq of numbers, SumSeq(seq) is the sum of the elements of seq, where SumSeq(<< >>) equals 0.  Show the answer.

   RECURSIVE SumSeq(_)
SumSeq(seq)  ==  IF seq = << >> THEN 0


Another example of this kind of recursive definition is defining Op(S) for S in some set of sets by defining Op({}) and defining Op(S) in terms of the value of Op when applied to one or more sets smaller than S if S is nonempty.  For example, let's define SumSet(S) to be the sum of a finite set S of numbers.  The idea is to define:

SumSet(S)  ==  IF S = {} THEN 0
ELSE x + SumSet(S \ {x}) for some x in S

But how do we write some x in S in math?  In TLA+, it is written

    CHOOSE x \in S : TRUE

So, the definition can be written

   RECURSIVE SumSet(_)
SumSet(S)  ==  IF S = {} THEN 0
ELSE (CHOOSE x \in S : TRUE) +
SumSet(S \ {CHOOSE x \in S : TRUE})

The following equivalent definition, using the LET/IN construct, is easier to read:

   RECURSIVE SumSet(_)
SumSet(S)  ==  IF S = {} THEN 0
ELSE LET y == CHOOSE x \in S : TRUE
IN  y + SumSet(S \ {y})

If you are not already familiar with CHOOSE, do not use it for anything except this kind of recursive definition.  If you do, you will almost certainly use it incorrectly.  One place to learn how to use it correctly is Section 6.6 of Specifying Systems

### 1.4  Asserting Connectivity

We can now assert the assumption that possibility 4 does not occur.  This means asserting the property that every two nodes are connected by a path, in which case we say that the graph is connected.  If every node is connected by a path to the root, then any two nodes are connected by a path through the root.  Therefore, the graph is connected iff for any node, there is a path to it from the root.  (Remember that <<root>> is a path from the root to itself.)

How do we compute whether there is a path from the root to a node n?  We first check if n equals the root.  If it doesn't, we check if it's a neighbor of the root.  If it isn't, we check if it's a neighbor of a neighbor of the root.  And so on, until we either find n or have examined all the nodes.  In the latter case, there is no path to n from the root.

What this computation does is find all nodes reachable by a path from root until we either find n or have found all such nodes.  That possibility 4 does not occur means that every node is reachable by a path from root, so that the set of reachable nodes equals the set Nodes of all nodes.  So to rule out possibility 4, we have to define the set of all nodes reachable from the root.

A simple way to do this is to define ReachableFrom(S) for a set S of nodes to be the set of all nodes reachable by a path from a node in S.  We then rule out possibility 4 by asserting:

   ASSUME  ReachableFrom({root}) = Nodes

Let SetNbrs(S) be the set of all neighbors of nodes in the set S of nodes.  The way we compute the nodes reachable from the root is based on the observation that ReachableFrom(S) equals ReachableFrom(S \cup SetNbrs(S)), for any set S of nodes.  This leads to a recursive definition that differs from the recursive definitions we've seen thus far because it defines ReachableFrom(S) in terms of ReachableFrom(T) for a set T that is larger than S rather than smaller than S.  The recursive computation stops when ReachableFrom(S) is as big as it can be, which is when SetNbrs(S) is a subset of S.  This will eventually happen if the set of nodes is finite, which is the only case that concerns us.  So, we have this definition:

   RECURSIVE ReachableFrom(_)
ReachableFrom(S)  ==  LET R == SetNbrs(S)
IN  IF R \subseteq S THEN S
ELSE ReachableFrom(R \cup S)

This definition has to be preceded by the definition of SetNbrs.  To define SetNbrs, we first define the set Nbrs(n) to be the set of neighbors of a node n.  A neighbor of n is a node m for which there is an edge joining m and n.  That edge must be the edge {m,n}, so Nbrs(n) is the set of all nodes for which {m,n} is an edge.  In other words:

   Nbrs(n)  ==  {m \in Nodes : {m,n} \in Edges}

SetNbrs is the union of all the sets Nbrs(n) for n in Node.  In other words:

   SetNbrs(S)  ==  UNION {Nbrs(n) : n \in S}


Exercise 3   Evaluating ReachableFrom from our recursive definition above is rather inefficient.  Once a node n is added to the set S, the value of Nbrs(n) keeps being computed in every iteration of the recursion.  Find a more efficient way of computing ReachableFrom(S) and use it to write a recursive definition that is evaluated more efficiently.  Show the answer.

Answer   A more efficient algorithm maintains the set S as the union of two disjoint sets, a set T containing interior nodes of S all of whose neighbors are in S, and a set B of boundary nodes of S that may contain neighbors not in S.  The computation terminates when B is the empty set.  This computation describes the evaluation of the operator RF defined by:

   RECURSIVE RF(_, _)
RF(B, T)  ==  IF B = {} THEN T ELSE RF(SetNbrs(B) \ T, B \cup T)

ReachableFrom(S) can then be defined to equal RF(S, {}).

We have disallowed possibilities 1–4.  Ruling out possibility 4 rules out possibility 5 except in one case: when root is the only node.  There is no reason not to allow that case.

## 2  The Algorithm

### 2.1  The General Idea

The purpose of our algorithm is to have each non-root node n find a neighbor to whom it will forward messages towards the root.  We'll call that node the parent of n.  The path obtained from any node by going from each node to its parent, ending if and when it reaches the root, will be called the parent path.  (If you haven't been drawing pictures to help you, this might be a good time to start.)

There's an obvious algorithm for having each non-root computer choose its parent.  Initially the parent of the root is itself and it doesn't matter who the parents of the other nodes are.  The root sends a message to each of its neighbors.  Each of its neighbors sends a message to its other neighbors, and so on.  A non-root node chooses as its parent the node that sent the first message it received.  (Don't confuse these algorithm messages with the messages that are to be routed to the root.)

You should convince yourself that the parents computed by this algorithm have the property that once a node has chosen its parent, the parent path from that node leads to the root.  Eventually, every node will have chosen its parent and the algorithm will terminate.  However, we require an algorithm to route messages along a minimal-length path to the root.  You should be able to see that the obvious algorithm doesn't do that.  If there are multiple paths (without cycles) from a node n to the root, then any of those paths can be made the parent path from n by the proper choice of the order in which messages are received.  We don't want to make any assumptions about how long it takes messages to be received, so we must modify the algorithm to ensure that parents are chosen to make all parent paths minimal-length paths to the root.  I'll now describe how we do that.

Let the depth of a node be the length of its shortest path to the root.  Each computer is represented by a process that maintains two values local to it: depth and parent.  I'll call depth[n] and parent[n] the values of those variables for process n (which is what they're called in the TLA+ translation of the algorithm).  When the algorithm terminates, depth[n] will equal the depth of node n and the parent paths defined by all the values parent[n] are shortest paths to the root.

Initially, depth[root] equals zero and parent[root] equals root; and depth[n] equals ∞ and the initial value of parent[n] doesn't matter for a non-root node nThe algorithm maintains this invariant:

For every node n:
• depth[n] is greater than or equal to the depth of n.
• If depth[n] is less than ∞, then the parent path from n (determined by the current values of parent[m] for all nodes m) leads to the root and has length at most depth[n].
The algorithm begins with the root sending a message containing the value 0 to all its neighbors.  From then on, every node n does the following when it receives a message:
Action of Node n on Receipt of Message
If the number v contained in the message is less then depth[n]-1, then set depth[n] to v+1, set parent[n] to the message's sender, and send a message with value v+1 to every neighbor of n except the original message's sender.  Otherwise, ignore the message.
A message that is not ignored causes the node n that receives it to decrease the value of depth[n].  However, the value of depth[n] is always ∞ or a natural number, so it can be decreased only a finite number of times.  Eventually, any remaining messages will be ignored.  When all those messages have been received, the algorithm terminates.

Correctness of the algorithm means that, when it terminates, the parent path from any node is a minimal-length path to the root.  The invariant described above implies that this is the case if depth[n] eventually equals the depth of n for every node n.  If n has depth 0, then it is the root, and depth[root] initially equals 0.  If n has depth 1, it eventually receives a message from the root causing it to set depth[n] to 1.  If n has depth 2, it eventually receives a message sent by a node of depth 1 upon receiving a message from the root, which causes n to set depth[n] to 2.  And so on.  If you've learned to write proofs by mathematical induction, you can translate the and so on to a rigorous argument that eventually depth[n] equals the depth of n, for every node n.

### 2.2  The PlusCal Code

We now turn the idea for an algorithm described above into PlusCal.  We've decided that the algorithm will represent each computer as a process, so the set of processes will be the set Nodes of nodes.  The variables local to each process are depth and parent.  The initial value of depth[n] for a non-root node n should be ∞, which we could represent by a constant Infty.  However, this would require defining a less-than relation, which we might call \prec (pretty-printed as ≺), on the set Nat \cup {Infty}.  It can be defined as:

m \prec n  ==  IF m = Infty THEN FALSE
ELSE IF n = Infty THEN TRUE ELSE m < n

For simplicity, instead of ∞, we will use a number bigger than the depth of any node.  The depth of a node has to be smaller than the number of nodes.  We therefore add to the module a declaration of the constant MaxNodes and the assumption

   MaxNodes >= Cardinality(Nodes)

which requires adding the module FiniteSets that defines Cardinality to the EXTENDS statement.  We let the initial value of depth equal 0 for the root node and equal MaxNodes for all other nodes.

The initial value of the variable parent for the root should equal the root itself.  The initial value of parent doesn't matter for other nodes; for simplicity we let it equal the node itself for all nodes.

A multiprocess algorithm needs one or more global variables that processes can use to communicate with one another.  In our algorithm, processes communicate by messages, so we have to decide how to represent the sending and receiving of the algorithm messages.  A node n can send more than one message to another node m, and it's possible for n to send a message to m before the previous message has been received.  A common way to represent this kind of message passing is with a queue containing the sequence of messages in transit from n to m.  Process n sends a message to m by appending it to the end of the queue; and m receives a message by removing it from the front of the queue.  This implies that messages from n to m are received in the order in which they were sent.  In-order reception isn't necessary, but let's keep things simple for now.

We let <<n,m>> be the name of the queue of messages from n to m and let the algorithm have a global variable msg, where msg[<<n,m>>] is the sequence of messages in the queue <<n,m>>.

In real programs, messages contain headers and checksums and other things.  That stuff doesn't belong in an algorithm, which should describe only the relevant information in the message.  The information in a message of our algorithm is just a number, so msgs[<<n,m>>] is a sequence of numbers, for each queue <<n,m>>.  Instead of the algorithm beginning by having the root send messages containing 0 to its neighbors, for simplicity we let the initial state be one in which those messages have already been sent.  So the algorithm, which we call FindRoutes, looks like this, where Queues is the set of all queues.

   --algorithm FindRoutes {
variable msgs = [q \in Queues |-> IF q[1] = root THEN <<0>>
ELSE << >>] ;
fair process (node \in Nodes)
variables depth = IF self = root THEN 0 ELSE MaxNodes ,
parent = self ;
{ the process's code
}
}


Let's now write the definition of Queues, which must appear in the module before the algorithm.  For every node n and every neighbor node m of n, there is a queue <<n,m>> from n to m and a queue <<m,n>> from m to n.  We first define QueuesFrom(n) and QueuesTo(n) to be the set of queues from and to n, respectively:

   QueuesFrom(n)  == {<<n, m>> : m \in Nbrs(n)}
QueuesTo(n)  ==  {<<m, n>> : m \in Nbrs(n)}

You can now define Queues.

Exercise 4   Define the set Queues of all queues.  Show the answer.

Answer   I can think of the following simple definitions:

Queues  ==  UNION {QueuesFrom(n) : n \in Nodes}
Queues  ==  UNION {QueuesTo(n) : n \in Nodes}
Queues  ==  {e \in Nodes \X Nodes : {e[1], e[2]} \in Edges}


Each process repeatedly executes the node action above until it stops receiving messages.  A process that has received no message with the value 0 or 1 can never be sure that it will receive no more messages.  (Showing why is a nice puzzle.)  So, we let processes always be ready to receive a message.  The algorithm terminates when there are no messages in any queue—a condition that TLC considers to be deadlock.

The body of a process will therefore be a while (TRUE) loop, each iteration of the loop receiving and acting on a single message to that process. The body of that loop for process n must choose an arbitrary queue in QueuesTo(n) and process its first message.  It does that with a statement

 with (q \in S) {...}

where S is the set of all queues r in QueuesTo(n) for which msgs[r] contains at least one message.  Since the process's code is for the process with id self, all this is expressed by:

   a: while (TRUE) {
with (q \in {r \in QueuesTo(self) : msgs[r] /= << >>}) {
if (Head(msgs[q]) < depth - 1) {
depth := Head(msgs[q]) + 1 ;
parent := q[1] ;
Compute changes to msgs
} else { msgs[q] := Tail(msgs[q]) }  }  }

To complete the algorithm, we just have to write the code to compute the changes to msgs.  But first, let's take a closer look at the with statement.

We saw in Session 4 that in a single-process algorithm, if S equals the empty set { }, then executing the statement with (q \in S) {...} causes the program to halt.  But we've just written such a statement in which the set S equals the empty set if all the queues in QueuesTo(self) are empty.  In that case, the whole program doesn't stop; just process self stops because the with statement cannot then be executed.  It could be executed later, when an action of some other process changes the value of S.  In a single-process algorithm, there is no other process, so if S is empty then it will remain empty forever and the program is deadlocked.  This with statement is completely equivalent to:

await S /= { } ;
with (q \in S) {...}


We now return to the problem of writing the code to compute the changes to msgs.  If we were writing a program, this would be easy.  The code would set msgs[q] to its tail and then execute a while loop to append depth to msgs[r] for each r in QueuesFrom(self).  If you do that, you will find that the translator requires there to be a label on the inner while loop.  A single step of a PlusCal algorithm can execute at most one assignment to any variable.  (PlusCal has a multiple assignment statement that can assign values to several elements of an array in a single step, but it won't help us here.)

Using a while makes sending a message to each of the process's neighbors a separate step.  This would be OK if we were writing a program.  But we're writing an algorithm not a program.  In our algorithm, we want the entire processing of the message, which means the entire body of the while (TRUE) loop, to be a single step.  The obvious reason for writing the coarser-grained algorithm is that it makes model checking more efficient.  TLC will be able to check the algorithm on larger models.  But the most important reason is that it's easier to think about a coarser-grained algorithm.  This is because the invariant that explains why the algorithm is correct is simpler.  We may ultimately decide that implementing the coarser-grained actions is too difficult, and we need to write a finer-grained algorithm.  However, we should still begin with the coarser-grained one.

In our algorithm, we want Compute changes to msgs to be a single assignment to msgs.  Here is an assignment statement that does it:

   msgs := [r \in Queues |->
IF r = q THEN Tail(msgs[q])
ELSE IF r \in QueuesFrom(self) \ {<<self, q[1]>>}
THEN Append(msgs[r], depth)
ELSE msgs[r] ]

The right-hand side of this assignment statement is the most complicated mathematical expression we've encountered so far.  I hope that by now you are not intimidated by such an expression, and you can understand it by taking it apart piece by piece.

The complete algorithm is contained in module Session10.  Open the module in the Toolbox now and make sure you understand the algorithm and the declarations and definitions that precede it.

### 2.3  Checking Correctness

You should have been using TLC to help you solve the exercises, checking the correctness of your answers before looking at the answers that I've provided.  If opening module Session10 in the Toolbox and running the translator produces no parsing errors, then you've probably downloaded exactly the module that I created.  Let's now check correctness of the algorithm it contains.

Correctness of the algorithm means that the parent paths it constructs are shortest paths to the root.  But before checking that, we should first check a correctness condition that every algorithm's specification should satisfy: It should specify whether any behavior — that is, whether any sequence of states — does or does not satisfy the specification.  Failure to satisfy this condition causes TLC to report an execution error.  For example, a specification does not satisfy the condition if it allows the value of a variable in some state to have the value Tail(<< >>), since TLA+ does not specify what that value is.  So, the first thing we do is check if this condition is satisfied by running TLC on a model of the algorithm and seeing if it raises an error, without telling it to check anything.

To create a model on which to run TLC, we first have to decide what value to substitute for the set Nodes.  We could substitute any set; a set of numbers or a set of strings would be obvious choices.  I prefer to use a set of model values.  A model value is a value that TLC assumes is different from any other value it encounters.  Grammatically, it can be any identifier that is not defined or declared in the specification.  Normally, you should begin by checking a tiny model; I would start with a set of two nodes.  But let's use a model for which I've already checked the algorithm.  Create a new model and, in the menu for specifying the value of the constant Nodes, enter this value:

   {r, n1, n2, n3, n4, n5}

and select Set of model values.  (Don't select either Symmetry set or Type.)  Anywhere else in the model, including in specifications of other declared constants and in invariants, the model values r, n1, ... , n5 can be used like ordinary declared constants.  Have the model set root to r and set Edges to

   {{r,n1}, {r,n2}, {n1,n3}, {n2,n3}, {n3,n4}, {n3,n5}}

For both these substitutions, select Ordinary assignment.  Draw a picture of the graph described by these sets of nodes and edges.

Model values enable some features of TLC, the most important of which is taking advantage of symmetry to speed up model checking.  You should read about model values in the Toolbox's Model Values and Symmetry help page.  That page can be found by searching the Toolbox's Help for symmetry, or by clicking here.

Meanwhile, complete the model by assigning a big enough number to MaxNodes.  I used 99.  Now run the model.  It will report the error Deadlock reached.  (If you were clever enough to disable deadlock checking, enable it and run TLC again.)  Looking at the last state of the error trace shows that every message queue msgs[<<n,m>>] is empty, so the algorithm has terminated.  Remember that termination is halting when we want the algorithm to halt, and deadlock is halting when we don't want it to halt.  TLC has no way to know that we wanted execution to halt in this state, so it reports an error.

We can take advantage of this error trace to check that the algorithm did what it was supposed to do.  On your picture of the graph, draw an arrow from each non-root node to its parent.  Check that the parent path from each node is a shortest path to the root.

TLC's report of deadlock shows that there exists at least one terminating behavior of the algorithm.  We should check that every behavior terminates.  The algorithm has terminated iff every message queue is empty.  We want to show that this is eventually true in every behavior.  This condition is expressed by the temporal formula <>Terminated, where Terminated is defined in module Session10 by:

   Terminated  ==  \A q \in Queues : msgs[q] = << >>

Add <>Terminated to the model's list of properties to be checked and have TLC check that it's satisfied.

There is one additional property that should be checked for almost all algorithms: type correctness.  PlusCal and TLA+ are untyped languages, and this has led some people to believe that I think type checking is not important.  In fact, I believe that type correctness is very important —  too important to be left to the primitive type systems and incomplete type checking provided by programming languages.  Moreover, there is no need to add anything special for type checking because type correctness is expressed by an ordinary invariant.

Type correctness of an algorithm is expressed by an invariant asserting that the value of each variable is a member of a set called the variable's type.  I like to name the type-correctness invariant TypeOK.  Here is the definition of a simple type-correctness invariant for our algorithm:

   TypeOK == /\ depth  \in [Nodes -> Nat]
/\ parent \in [Nodes -> Nodes]
/\ msgs \in [Queues -> Seq(Nat)]

Even this weak type invariant can't be expressed by type declarations in many popular programming languages.  However, we can strengthen it to provide more information to a reader of the module, and so checking its invariance can catch more errors:

   TypeOK  ==  /\ depth  \in [Nodes -> 0..MaxNodes]
/\ parent \in [Nodes -> Nodes]
/\ msgs \in [Queues -> Seq(0..(Cardinality(Nodes)-1))]

This definition appears in module Session10.  TLC can check that TypeOK is indeed an invariant of the algorithm.

We can strengthen TypeOK still further.  For example, we could assert that msg[q] is a decreasing sequence of numbers, for every q in Queues, and that those numbers are all greater than or equal to depth[q[1]].  You should convince yourself that the algorithm actually satisfies the stronger invariant.

Exercise 5   Strengthen the definition of TypeOK in this way and have TLC check that it is still an invariant.  Show the answer.

Answer   One way to do this is by adding the following conjunct to the definition:

   /\ \A q \in Queues :
\A i \in 1..Len(msgs[q]) :
/\ \A j \in 1..(i-1) : msgs[q][j] > msgs[q][i]
/\ msgs[q][i] >= depth[q[1]]


It is not clear if the stronger invariant should still be called a type invariant rather than simply a property of the algorithm.  But whatever you call it, writing an invariant like this tests your understanding of the algorithm.

After checking type correctness, we have to check that the algorithm is actually correct.  Our algorithm is supposed to perform a computation and terminate.  We've already checked that it terminates.  We now want to check that it computes the right thing.  In other words, we want to check that it satisfies a postcondition.

We check that a single-process algorithm satisfies a postcondition with an assert statement that is executed when the algorithm terminates.  For a multiprocess algorithm, we use an invariant asserting that if the algorithm has terminated, then the postcondition is true.  We have defined Terminated to be the state predicate that is true iff our algorithm has terminated.  Correctness therefore means that this formula is an invariant of the algorithm:

   Terminated => Postcondition

for the appropriate definition of Postcondition

The postcondition our algorithm should satisfy is that the parent path from every node is the shortest path from that node to the root.  Let PPath(n) be the parent path from the node n, and let Dist(n, m) be the length of the shortest path from n to m.  I've defined a path to be a sequence of nodes; and I've defined the length of a path to be the number of node-to-node hops it makes, which is one less than the number of nodes in the path.  Thus, the length of a path is one less than the length of the sequence of nodes that is the path.  This is confusing, and I'm sorry about that.  It's what happened when I tried to keep the math simple and let the length of a path have its usual meaning.  Please forgive me and convince yourself that the postcondition's definition should be:

   Postcondition  ==  \A n \in Nodes : Len(PPath(n)) - 1 = Dist(n, root)

To complete the module, we must define PPath and Dist.  You can find my definitions in module Session10, but you should try to write your own definitions before looking at mine.  My definition of PPath(n) assumes that the parent path from n ends at the root and isn't an infinite sequence of nodes.  My definition of Dist(n,m) assumes that there is a path from n to m.  For fun, you might try to write definitions that don't make the assumptions I've made.  But first, you need to decide what their values should be when those assumptions don't hold.

In addition to checking type correctness and the correctness condition, it's a good idea to check that the algorithm satisfies other invariants.  Failure to satisfy an invariant could reveal an error in the algorithm even if the algorithm behaves correctly for the model you are checking.  If the algorithm is correct, incorrectness of your invariant will reveal that the algorithm allows behaviors you didn't expect.  Knowing this can help you avoid errors when implementing the algorithm.  As an exercise, write and check the invariant of our routing algorithm described above For this and the following exercises, check the algorithm on several graphs — starting with a small model and working up to larger models.  The smallest model, which you should check, is a graph containing only the root.  A small example that can catch many errors is the graph containing three nodes and three edges.

Exercise 6   As mentioned above, the algorithm does not depend on messages being delivered in order.  Modify the algorithm so messages in the message queues can be received in any order.  You should use the RemoveElt operator you defined in this exercise of Intermezzo 1.

Exercise 7   In our algorithm, a process n never sends the same message twice to a neighbor m.  If we allow messages to be received in any order, we can therefore represent the messages in transit from n to m by a set of messages rather than a sequence of messages.  Write this version of the algorithm.

Project   Our algorithm eventually terminates, having computed minimal-length paths from every node to the root.  However, no node can tell when that has happened.  Your problem is to enhance the algorithm so the root learns when computation of the paths has finished.  Nodes will obviously have to send information to the root.  This is not an easy problem.

### 2.4  An Observation

Look at the TLA+ translation of the algorithm.  Notice that it does not have a pc variable.  That variable would serve no purpose, since it would be a constant—the value of pc[p] for any process p would always equal "a".  This is because a is the only label and it is impossible for the process to terminate, so pc[p] could never equal "Done"

In the algorithm, a process repeatedly performs a single action, described by the body of the while (TRUE) loop, whenever that action is enabled.  This is common for high-level descriptions of distributed algorithms.