1 The Problem
1.1 The Real Problem 1.2 The Graph 2 The Algorithm 2.1 The General Idea 2.2 The PlusCal Code 2.4 An Observation Prev: Intermezzo 1 Next: Intermezzo 2 Contents 
1 The Problem1.1 The Real ProblemWe now consider an algorithm introduced in the 1970s for routing messages in a computer network. The network consists of a collection of computers, each of which can send messages directly to some of the other computers, which we'll call its neighbors. A message sent by a computer to a computer not a neighbor has to be relayed by other computers to its destination. Let a path in the network be a sequence of computers, each of which (other than the first) is a neighbor of the preceding one. A path is a possible route that a message can traverse when sent from the first to the last computer in the path. Let the length of the path be the number of computer to computer transmissions taken by a message following that path, so it's one less than the number of computers in the path. We consider a sequence consisting of a single computer to be a path of length zero from that computer to itself. The problem designers of the network had was to find an algorithm for routing messages that ensured messages sent from one computer to another traveled along a shortest possible path. Computers and transmission lines can fail and be repaired, changing the set of computers and/or the set of neighbors of a computer. At any time, a computer can directly observe only who its neighbors are. Messagerouting information has to be recomputed when that happens. When developing an algorithm like this one, it helps to start simple. Instead of a dynamic algorithm that continually updates the routing information as the network changes, we consider a fixed network. Initially, each computer knows who its neighbors are. The algorithm should compute the routing information and stop. A solution to that problem can guide us to a more general dynamic algorithm. We'll simplify the problem still more. Instead of computing routing information for sending messages from any computer to any other computer, we will compute that information for sending messages to one specific computer, which we call the root. We assume that each computer knows who its neighbors are, but only the root knows it's the root. Our algorithm will give each computer the information it needs to send a message to the root; it can easily be modified to distribute the identity of the root as well. From our algorithm, it will be easy to solve the more general problem of routing messages from any computer to any other computer. We just have to write an algorithm that executes in parallel multiple copies of the simple algorithm, each copy having a different computer as the root. At this point, you're probably expecting to see some pictures illustrating the problem. Pictures are very useful for helping us understand things. But when you're faced with problems in the real world, there may not be anyone to draw pictures for you; you'll have to draw your own. So, get a pencil and paper or whatever device you like to draw on, and draw a few pictures to help you visualize our problem. 1.2 The GraphComputer networks are complicated, containing gazillions of components built out of silicon and other elements. We have to abstract from all that stuff to a mathematical model of the network in which we can state and solve the problem. As you probably know, the abstract model commonly used to describe a computer network is a graph. We too will use that model. The first thing we must decide is how to represent the graph mathematically. How we do that depends on exactly what kind of graphs we need to consider. A graph consists of a set of nodes connected by edges. When I asked you to draw helpful pictures, you probably drew graphs. Did any of your graphs contain the following?
For reasons I will explain, we will assume that our graph does not contain any of the first four possibilities. We rule out possibility 1 because in many networks, including the one for which the algorithm was intended, sending a message from one computer to its neighbor requires an exchange of messages. One computer can send messages to another iff the other can send messages to it, so there is no need for edges in the graph to have a direction.
We can rule out possibility 2 because an edge between two nodes
indicates only that they can send messages to one another; multiple
edges between them serve no purpose. Ruling out possibilities 1
and 2 allows us to represent an edge joining nodes Having ruled out possibilities 1–3, we can put our algorithm in a module, declaring our sets of nodes and edges and the root node with:
CONSTANT Nodes, Edges, rootThe module should also contain an assumption asserting that root is a node and that every edge is a set of nodes
containing exactly two elements. If we define
Has2Elts(S) to be true iff the set S
contains exactly two elements, then this assumption can be written:
ASSUME /\ root \in Nodes /\ \A e \in Edges : (e \subseteq Nodes) /\ Has2Elts(e)
Exercise 1 Write a definition of the operator
Answer
Has2Elts .
The standard TLA+ module FiniteSets defines the operator
Cardinality so that Cardinality(S)
equals the number of elements in S ,
for any finite set S . So, Has2Elts(S)
equals Cardinality(S) = 2 .
Possibility 4 means that some computers can't send messages to the root. An algorithm can provide no information to those computers. For simplicity, we ignore those computers by pretending they don't exist, which we do by ruling out possibility 4. Ruling it out means writing an assumption asserting that every node is connected by a path to every other node. Before we can write such an assumption, I need to explain a little math. 1.3 Recursive DefinitionsAsserting the assumption that rules out possibility 4 requires a recursive definition. If you've programmed in a functional programming language, you should be familiar with recursive definitions and can skip to Exercise 2.
Recursion, which is called induction by mathematicians, allows you to
define mathematically anything you can compute with an
algorithm. The classic example is the definition of
factorial, where
How would we define IfThis precisely specifies the value of N! for any natural
number N . It even tells how to compute it, without any
or and so on. Here is how we can express that mathematical definition in TLA+.
Since TLA+ and PlusCal don't let us write factorial(N) == IF N = 0 THEN 1 ELSE N * factorial(N1)If we put this definition in a module, the TLA+ parser will complain that the factorial in the ELSE expression is
undefined. That's because factorial is undefined
until after the definition. TLA+ (unlike most programming
languages) requires us to declare that this use of factorial
before its definition is intentional by preceding it with:
RECURSIVE factorial(_)With this definition, TLC can compute factorial(N)
for small enough values of N — values up to 12
on my computer. How big a value can it handle on your
computer?
The common form of a recursive definition of an operator
This kind of recursive definition is used
to define
Exercise 2 Define an operator Answer
RECURSIVE SumSeq(_) SumSeq(seq) == IF seq = << >> THEN 0 ELSE Head(seq) + SumSeq(Tail(seq))
Another example of this kind of recursive definition is
defining
SumSet(S) == IF S = {} THEN 0 ELSE x + SumSet(S \ {x}) for someBut how do we write some x in S
in math? In TLA+, it is written
CHOOSE x \in S : TRUESo, the definition can be written
RECURSIVE SumSet(_) SumSet(S) == IF S = {} THEN 0 ELSE (CHOOSE x \in S : TRUE) + SumSet(S \ {CHOOSE x \in S : TRUE})The following equivalent definition, using the LET /IN construct, is easier to read:
RECURSIVE SumSet(_) SumSet(S) == IF S = {} THEN 0 ELSE LET y == CHOOSE x \in S : TRUE IN y + SumSet(S \ {y})If you are not already familiar with CHOOSE , do not use
it for anything except this kind of recursive
definition. If you do, you will almost certainly use it
incorrectly. One place to learn how to use it correctly
is Section 6.6 of
Specifying
Systems.
1.4 Asserting Connectivity
We can now assert the assumption that possibility 4 does not
occur.
This means asserting the property that every two nodes
are connected by a path, in which case we say that the
graph is connected.
If every node is connected by a path to the root, then any two nodes
are connected by a path through the root. Therefore, the graph
is connected iff for any node, there is a path to it from the
root. (Remember that
How do we compute whether there is a path from the root to a node
What this computation does is find all nodes reachable by a path from
A simple way to do this is to define
ASSUME ReachableFrom({root}) = NodesLet SetNbrs(S) be the set of all neighbors of
nodes in the set S of nodes. The way we compute the
nodes reachable from the root is based on the observation that
ReachableFrom(S) equals
ReachableFrom(S \cup SetNbrs(S)) , for any set
S of nodes. This leads to a recursive definition
that differs from the recursive definitions we've seen thus far
because it defines ReachableFrom(S) in terms of
ReachableFrom(T) for a set T that is larger
than S rather than smaller than S . The
recursive computation stops when ReachableFrom(S) is as
big as it can be, which is when SetNbrs(S) is a subset of
S . This will eventually happen if the set of nodes
is finite, which is the only case that concerns us. So, we have
this definition:
RECURSIVE ReachableFrom(_) ReachableFrom(S) == LET R == SetNbrs(S) IN IF R \subseteq S THEN S ELSE ReachableFrom(R \cup S)This definition has to be preceded by the definition of SetNbrs . To define SetNbrs , we first
define the set Nbrs(n) to be the set of neighbors of a
node n . A neighbor of n is a node
m for which there is an edge joining m and
n . That edge must be the edge {m,n} , so
Nbrs(n) is the set of all nodes for which
{m,n} is an edge. In other words:
Nbrs(n) == {m \in Nodes : {m,n} \in Edges} SetNbrs is the union of all the sets Nbrs(n)
for n in Node . In other words:
SetNbrs(S) == UNION {Nbrs(n) : n \in S}
Exercise 3 Evaluating
Answer A more efficient algorithm maintains the set
RECURSIVE RF(_, _) RF(B, T) == IF B = {} THEN T ELSE RF(SetNbrs(B) \ T, B \cup T) ReachableFrom(S) can then be defined to equal
RF(S, {}) .
root is the only node. There is no reason
not to allow that case.
2 The Algorithm2.1 The General Idea
The purpose of our algorithm is to have each nonroot node
There's an obvious algorithm for having each nonroot computer choose its parent. Initially the parent of the root is itself and it doesn't matter who the parents of the other nodes are. The root sends a message to each of its neighbors. Each of its neighbors sends a message to its other neighbors, and so on. A nonroot node chooses as its parent the node that sent the first message it received. (Don't confuse these algorithm messages with the messages that are to be routed to the root.)
You should convince yourself that the parents computed by this
algorithm have the property that once a node has chosen its parent,
the parent path from that node leads to the root. Eventually,
every node will have chosen its parent and the algorithm will
terminate.
However, we require an algorithm to route messages along a
minimallength path to the root.
You should be able to see that the obvious algorithm doesn't do that.
If there are multiple paths (without cycles) from a node
Let the depth of a node be the length of its shortest path to
the root. Each computer is represented by a process that
maintains two values local to it:
Initially, For every nodeThe algorithm begins with the root sending a message containing the value 0 to all its neighbors. From then on, every node n does the following when it receives a
message:
Action of NodeA message that is not ignored causes the node n that
receives it to decrease the value of depth[n] .
However, the value of depth[n] is always ∞ or a
natural number, so it can be decreased only a finite number of
times. Eventually, any remaining messages will be
ignored. When all those messages have been received, the
algorithm terminates.
Correctness of the algorithm means that, when it terminates, the
parent path from any node is a minimallength path to the root.
The invariant described above implies that this is the case if
2.2 The PlusCal Code
We now turn the idea for an algorithm described above into
PlusCal.
We've decided that the algorithm will represent each computer as a
process, so the set of processes will be the set
m \prec n == IF m = Infty THEN FALSE ELSE IF n = Infty THEN TRUE ELSE m < nFor simplicity, instead of ∞, we will use a number bigger than the depth of any node. The depth of a node has to be smaller than the number of nodes. We therefore add to the module a declaration of the constant MaxNodes and the assumption
MaxNodes >= Cardinality(Nodes)which requires adding the module FiniteSets that
defines Cardinality to the
EXTENDS statement. We let the initial value of
depth equal 0 for the root node and equal
MaxNodes for all other nodes.
The initial value of the variable
A multiprocess algorithm needs one or more global variables that
processes can use to communicate with one another. In our
algorithm, processes communicate by messages, so we have to
decide how to represent the sending and receiving of the algorithm
messages.
A node
We let
In real programs, messages contain headers and checksums and other
things. That stuff doesn't belong in an algorithm, which should
describe only the relevant information in the message. The
information in a message of our algorithm is just a number, so
algorithm FindRoutes { variable msgs = [q \in Queues > IF q[1] = root THEN <<0>> ELSE << >>] ; fair process (node \in Nodes) variables depth = IF self = root THEN 0 ELSE MaxNodes , parent = self ; { the process's code } }
Let's now write the definition of
QueuesFrom(n) == {<<n, m>> : m \in Nbrs(n)} QueuesTo(n) == {<<m, n>> : m \in Nbrs(n)}You can now define Queues .
Exercise 4 Define the set Answer I can think of the following simple definitions:
Queues == UNION {QueuesFrom(n) : n \in Nodes} Queues == UNION {QueuesTo(n) : n \in Nodes} Queues == {e \in Nodes \X Nodes : {e[1], e[2]} \in Edges} The body of a process will therefore be a while (TRUE)
loop, each iteration of the loop receiving and acting on a single
message to that process.
The body of that loop for process n must choose an
arbitrary queue in QueuesTo(n) and process its first
message. It does that with a statement
where S is the set of all queues r
in QueuesTo(n) for which msgs[r] contains
at least one message.
Since the process's code is for the process with id self ,
all this is expressed by:
a: while (TRUE) { with (q \in {r \in QueuesTo(self) : msgs[r] /= << >>}) { if (Head(msgs[q]) < depth  1) { depth := Head(msgs[q]) + 1 ; parent := q[1] ; Compute changes to msgs } else { msgs[q] := Tail(msgs[q]) } } }To complete the algorithm, we just have to write the code to compute the changes to msgs . But first, let's take a
closer look at the with statement.
We saw in Session 4 that in a singleprocess algorithm,
if
await S /= { } ; with (q \in S) {...}
We now return to the problem of writing the code to
compute the changes to
Using a
In our algorithm, we want
msgs := [r \in Queues >
IF r = q THEN Tail(msgs[q])
ELSE IF r \in QueuesFrom(self) \ {<<
The righthand side of this assignment statement is the most
complicated mathematical expression we've encountered
so far.
I hope that by now you are not intimidated by such an expression, and
you can understand it by taking it apart piece by piece.
The complete algorithm is contained in module
2.3 Checking Correctness
You should have been using TLC to help you solve the exercises,
checking the correctness of your answers before looking at
the answers that I've provided. If opening module
Correctness of the algorithm means that the parent paths it constructs
are shortest paths to the root. But before checking that, we
should first check a correctness condition that every algorithm's
specification should satisfy:
It should
specify whether any behavior — that is, whether
any sequence of states — does or does not
satisfy the specification.
Failure to satisfy this condition causes TLC to report an
execution error.
For example, a specification does not satisfy the condition if it
allows the value of
a variable in some state to have the value
To create a model on which to run TLC, we first have to decide what
value to substitute for the set
{r, n1, n2, n3, n4, n5}and select Set of model values. (Don't select either Symmetry set or Type.) Anywhere else in the model, including in specifications of other declared constants and in invariants, the model values r, n1, ... , n5 can be used like
ordinary declared constants. Have the model set
root to r and set Edges to
{{r,n1}, {r,n2}, {n1,n3}, {n2,n3}, {n3,n4}, {n3,n5}}For both these substitutions, select Ordinary assignment. Draw a picture of the graph described by these sets of nodes and edges. Model values enable some features of TLC, the most important of which is taking advantage of symmetry to speed up model checking. You should read about model values in the Toolbox's Model Values and Symmetry help page. That page can be found by searching the Toolbox's Help for symmetry, or by clicking here.
Meanwhile, complete the model by assigning a big enough number to
We can take advantage of this error trace to check that the algorithm did what it was supposed to do. On your picture of the graph, draw an arrow from each nonroot node to its parent. Check that the parent path from each node is a shortest path to the root.
TLC's report of deadlock shows that there exists at least one
terminating behavior of the algorithm. We should check that
every behavior terminates. The algorithm has terminated
iff every message queue is empty. We want to show that
this is eventually true in every behavior. This condition
is expressed by the temporal formula
Terminated == \A q \in Queues : msgs[q] = << >>Add <>Terminated to the model's list of properties to be
checked and have TLC check that it's satisfied.
There is one additional property that should be checked for almost all algorithms: type correctness. PlusCal and TLA+ are untyped languages, and this has led some people to believe that I think type checking is not important. In fact, I believe that type correctness is very important — too important to be left to the primitive type systems and incomplete type checking provided by programming languages. Moreover, there is no need to add anything special for type checking because type correctness is expressed by an ordinary invariant.
Type correctness of an algorithm is expressed by an invariant asserting
that the value of each variable is a member of a set called
the variable's type. I like to name the typecorrectness
invariant
TypeOK == /\ depth \in [Nodes > Nat] /\ parent \in [Nodes > Nodes] /\ msgs \in [Queues > Seq(Nat)]Even this weak type invariant can't be expressed by type declarations in many popular programming languages. However, we can strengthen it to provide more information to a reader of the module, and so checking its invariance can catch more errors:
TypeOK == /\ depth \in [Nodes > 0..MaxNodes] /\ parent \in [Nodes > Nodes] /\ msgs \in [Queues > Seq(0..(Cardinality(Nodes)1))]This definition appears in module Session10 .
TLC can check that TypeOK is indeed an invariant of the
algorithm.
We can strengthen Exercise 5 Strengthen the definition of
Answer One way to do this is by adding the following conjunct to the definition:
/\ \A q \in Queues : \A i \in 1..Len(msgs[q]) : /\ \A j \in 1..(i1) : msgs[q][j] > msgs[q][i] /\ msgs[q][i] >= depth[q[1]] After checking type correctness, we have to check that the algorithm is actually correct. Our algorithm is supposed to perform a computation and terminate. We've already checked that it terminates. We now want to check that it computes the right thing. In other words, we want to check that it satisfies a postcondition.
We check that a singleprocess algorithm satisfies a postcondition
with an
Terminated => Postconditionfor the appropriate definition of Postcondition .
The postcondition our algorithm should satisfy is that the parent path
from every node is the shortest path from that node to the root.
Let
Postcondition == \A n \in Nodes : Len(PPath(n))  1 = Dist(n, root)To complete the module, we must define PPath and
Dist . You can find my definitions in
module Session10 , but you should try to write your
own definitions before looking at mine.
My definition of PPath(n) assumes that the parent
path from n ends at the root and isn't an infinite
sequence of nodes. My definition of Dist(n,m)
assumes that there is a path from n to m .
For fun, you might try to write definitions that don't make the
assumptions I've made. But first, you need to decide what their
values should be when those assumptions don't hold.
In addition to checking type correctness and the correctness condition, it's a good idea to check that the algorithm satisfies other invariants. Failure to satisfy an invariant could reveal an error in the algorithm even if the algorithm behaves correctly for the model you are checking. If the algorithm is correct, incorrectness of your invariant will reveal that the algorithm allows behaviors you didn't expect. Knowing this can help you avoid errors when implementing the algorithm. As an exercise, write and check the invariant of our routing algorithm described above. For this and the following exercises, check the algorithm on several graphs — starting with a small model and working up to larger models. The smallest model, which you should check, is a graph containing only the root. A small example that can catch many errors is the graph containing three nodes and three edges.
Exercise 6 As mentioned above, the algorithm does not depend on messages being delivered in order. Modify the algorithm so messages in the message queues can be received in any order. You should use the RemoveElt
operator you defined in
this exercise
of Intermezzo 1.
Exercise 7 In our algorithm, a process
Project Our algorithm eventually terminates, having computed minimallength paths from every node to the root. However, no node can tell when that has happened. Your problem is to enhance the algorithm so the root learns when computation of the paths has finished. Nodes will obviously have to send information to the root. This is not an easy problem. 2.4 An Observation Look at the TLA+ translation of the algorithm. Notice that
it does not have a
In the algorithm, a process repeatedly performs a single action,
described by the body of the back to top
