Session 9 Liveness
Leslie Lamport
Last modified on Mon 19 August 2024 at 9:15:20 PST by lamport -->
|
|
1 Safety versus Liveness
Invariance of MutualExclusion is trivially satisfied by
an algorithm in which no process ever enters its critical
section.
Invariance belongs to a class of properties called safety
properties. A safety property is one that, if it is
not satisfied by an execution, then you can tell that it's not
satisfied by looking at a finite portion of the execution—even
if the execution is infinite. A formula is not invariant in an
execution iff it is not satisfied in some state of the execution, and
you can tell that by looking just at the execution up to and including
that state.
Termination is an example of a different class of properties called
liveness properties. A liveness property is one for which
you have to look at the entire execution, which may be infinite, to
determine that it is not satisfied. Termination means
that a terminating state is eventually reached. For an infinite
execution, to determine that a terminating state isn't reached just
by looking at the execution, you have to look at the entire
execution. Knowing that it's an execution of a certain
algorithm, you may be able to infer that it will never reach such a
state. But just by observing the execution, without knowing what
it's an execution of, you can't tell that it does not reach a state
satisfying some condition without seeing the complete execution.
If you look at the translation of algorithm Add2 or
the other algorithms in Session 7, you will see that they define
the property Termination by
Termination == <>(\A self \in ProcSet: pc[self] = "Done")
where ProcSet is defined to be the set
{3,-7} of process ids. This defines
Termination to be a temporal formula, meaning it is true
or false of a sequence of states. Mathematically, a state
is an assignment of values to variables, and a sequence of states is
called a behavior. We describe the execution of an
algorithm as a behavior. But when we discuss temporal formulas,
it's important to remember that a behavior just means a sequence of
states, not necessarily one generated by executing an algorithm.
Formula Termination is true of a behavior
iff its subformula
\A self \in ProcSet: pc[self] = "Done"
is true of at least one state of the behavior. For a behavior
allowed by a multiprocess PlusCal algorithm, this formula can be true
for at most one state—namely, the final state. But there
are lots of behaviors not allowed by any PlusCal algorithm for which
the formula is true for more than one state. There are also lots
of behaviors in which Termination is not true, but we
have to look at the entire behavior to be sure that it's one of them;
so Termination is a liveness property.
Formula Termination has the form <>P where
P is a formula that is true or false of a state. A
formula that is true or false of a state is called a state
predicate. A state predicate is just an ordinary
Boolean-valued formula that can contain variables but no primed
variables and no temporal operators like <> .
For any state predicate P , the formula <>P
is true of a behavior iff P is true of at least one state
of the behavior. The operator <> , pretty-printed
◇, is read eventually.
2 Alternate
The operator <> by itself just asserts that something
happens at least once. For
,
we want processes to enter the critical section not just once but
repeatedly. We want the code to say that the algorithm keeps
taking steps, and we want to check that processes keep entering the
critical section. Nothing runs forever; but just as the infinite set
of all integers is a convenient abstraction even though all the
integers can never be written, allowing a behavior of algorithm
Alternate to consist of infinitely many states is a
convenient abstraction.
click here or type a letter to hide
We want a behavior of algorithm Alternate to be
infinite, which means each process must be in its critical
section infinitely many times. For process 0 to be
in its critical section infinitely many times means that
formula pc[0]="cs" must be true on infinitely
many states of a behavior of the algorithm.
The temporal property asserting that pc[0]="cs"
is true on infinitely many states of a behavior is written
[]<>(pc[0]="cs") .
The symbol [] is pretty-printed ◻ and is read
always. If we think of the sequence of states of a
behavior representing the state of the system at successive times,
then the formula []<>P , which asserts that P
is true in infinitely many states of the behavior, asserts that
P is true at infinitely many times. For this
reason, []<> is usually read as infinitely often.
However, it's a mistake to think that a step from one state to
the next in a behavior necessarily represents time advancing.
I haven't explained what []<>P asserts about a finite
behavior. An easy way to think about what a temporal formula
means for finite behaviors is to pretend that a finite behavior is an
infinite behavior in which the final state is repeated forever.
This pretense tells us that []<>P is true for a finite
behavior iff P is true of the last state of that
behavior.
(Ideas introduced in Session 11 imply that this is more than just a
convenient pretense.)
The liveness property we want algorithm Alternate
to satisfy is that each process is in its critical section
infinitely often. Since Procs is the set
of processes, this is expressed by this property:
\A i \in Procs : []<>(pc[i] = "cs")
It is a liveness property because we have to look at the entire
behavior to know if it is not the case that
pc[i] = "cs"
is true in infinitely many states of the behavior.
For any algorithm, we have to add a fairness requirement to require
that it does something. The fairness requirement most often
added for multiprocess algorithms is weak fairness for each
process. Weak fairness for a process roughly means that the
process must keep taking steps if it can. I will explain
precisely what it means below. We require weak fairness of a
process by putting the keyword fair (followed by at
least one space) before the keyword process . For a
process set, this means weak fairness is required for each process in
the set.
Add the weak fairness requirement to the process set of algorithm
Alternate . Add to a model for the algorithm the
liveness property above to the Properties subfield of the
What to check? field of the Model Overview tab.
Run TLC to check that the algorithm satisfies this property.
3 1BitMutex
A mutual exclusion solution should not require that a process enter
its critical section infinitely often. A process should be
allowed to halt at any time in its noncritical section.
Consider
.
Halting in its noncritical section means that control in a process
remains forever at label ncs .
Weak fairness of a process would not allow that.
We don't want to allow a process to remain forever at any other label,
only at ncs .
I will call this requirement weak fairness of the process except
at ncs .
It is specified by a fair process declaration with the
modifier - added to the label ncs . The
beginning of the process becomes:
fair process (P \in Procs) {
ncs:- while (TRUE) {
skip ;
The modifier - comes after the : that
follows the label. (There can be spaces before and/or after
the : .) Open your specification containing
algorithm 1BitMutex and make this modification to it.
Some statements of the mutual exclusion problem allow a process to remain
forever in its critical section. In that case, we would also
add the - modifier to the label cs .
However, that complicates the liveness properties that a solution
should have, so we will require that a process with control
at cs eventually executes that statement, leaving the
critical section.
Algorithm 1BitMutex ensures that if process 0 begins
executing the entry code, then it will enter the critical
section. We can state that more precisely as:
If pc[0] equals "enter" , then
it eventually equals "cs" . This condition
is
expressed in TLA+ with the temporal operator ~> ,
pretty printed
and read leads to. For state predicates P
and Q , the formula P ~> Q is true of a
behavior iff for every state of the behavior in which P
is true, Q is true in that state or a subsequent
state. The liveness guarantee that the algorithm ensures
for process 0 is:
(pc[0] = "enter") ~> (pc[0] = "cs")
click here or type a letter to hide
Open a model for the algorithm and add this formula to the
Properties to be checked on the Model Overview
page. Run TLC to check that the algorithm satisfies this
property.
The algorithm does not satisfy the corresponding condition for
process 1. Have TLC check that property:
(pc[1] = "enter") ~> (pc[1] = "cs")
Examine the error trace reported by TLC to see why 1BitMutex
doesn't satisfy the property.
Algorithm 1BitMutex satisfies
.
This requirement can also be expressed as follows:
Whenever there is some process in the entry code, some process must
then or later be in the critical section.
This condition is confusingly called deadlock freedom . It
actually prohibits deadlock and livelock, so I will give it the
awkward name DeadAndLivelockFree .
If Procs is the set of processes, it is defined by:
DeadAndLivelockFree ==
(\E i \in Procs : pc[i] = "enter") ~> (\E i \in Procs : pc[i] = "cs")
click here or type a letter to hide
Add this definition to your module, add
DeadAndLivelockFree to the temporal properties checked by
the model, and have TLC check that it is satisfied by
1BitMutex . Also check that the
N -process version you wrote in Exercise 4 of
Session 8 satisfies the property.
4 Starvation Freedom
In Session 8, an algorithm was defined to be starvation free if any
process that begins executing the entry code eventually enters the
critical section. For our mutual exclusion algorithms, where
Procs is the set of processes, this property is expressed in
TLA+ as:
StarvationFree == \A i \in Procs : (pc[i] = "enter") ~> (pc[i] = "cs")
Modify
to make each process weakly fair except at ncs , and
have TLC check that it satisfies property StarvationFree .
Do the same for the N -process generalization
in Exercise 5 of Session 8.
click here or type a letter to hide
5 Semaphore Liveness
of Session 7 uses a semaphore variable sem
to implement mutual exclusion, where the critical section consists
of statements ab1 and ab2 .
We can obviously use a semaphore to implement a mutual exclusion
algorithm like this:
click here or type a letter to hide
--algorithm SemMutex {
variable sem = 1 ;
fair process (P \in Procs) {
ncs:- while (TRUE) {
skip ;
enter: await sem = 1 ;
sem := 0 ;
cs: skip ;
exit: sem := 1 ;
}
}
}
where Procs can be any finite set.
Use TLC to check that this algorithm satisfies mutual exclusion and
also the property DeadAndLivelockFree . It does not
satisfy StarvationFree . Have TLC check starvation
freedom and examine the error trace that it produces. TLC finds
a behavior that reaches a point at which one or more processes
remain forever at the enter statement, while one or
more other processes repeatedly enter and exit the critical section.
When I just tried it, process 0 was one process stuck forever
at enter .
We say that the enter step of process 0 is
enabled in a state iff it is possible for 0 to
take an enter step starting in that state. It is
enabled in states in which pc[0] equals
"enter" and sem equals 1.
(Even though code implementing statement enter might keep
reading the semaphore when it equals 0, that is not
considered an enter step.)
The behavior found by TLC contains infinitely many states in which
process 0's enter step is enabled. (More precisely,
it contains a finite number of such states that occur infinitely
many times.)
However, the behavior also contains infinitely many states in which
that step is not enabled. Weak fairness only requires a process
0 to take an enter step if that step will remain
continuously enabled until it takes that step.
The fair process declaration requires that for every
label lbl not modified by - , the step
lbl of its process (or each of its processes for a
process set) is weakly fair in every behavior of the algorithm.
A step of a process is strongly fair in a behavior iff
there is no point in the behavior after which the step is enabled in
infinitely many states but no such step occurs in the behavior.
In the behavior found by TLC, at some point there are infinitely many
states in which an enter step of process 0 is enabled,
but such a step is never performed. Strong fairness of that step
would forbid such a behavior.
To require strong fairness of an sp step in a
fair process, we modify the sp label by
adding a + after it. Modify algorithm
SemMutex by changing the enter statement
to:
enter:+ await sem = 1 ;
sem := 0 ;
Run TLC and check that that the algorithm now satisfies
StarvationFree .
Strong fairness implies weak fairness, since if a step
is enabled from some point on, then it is enabled in
infinitely many states.
(Remember we are pretending that terminating executions are
represented by behaviors in which the last state is repeated
forever.)
Strong fairness is a stronger requirement than weak fairness because
it requires the step to eventually be taken if it becomes enabled
infinitely often, even if it also becomes disabled infinitely often.
Both weak and strong fairness are liveness properties because we can
tell if they are not satisfied by a behavior only by examining the
complete behavior. (Remember that for determining liveness, we
don't assume that the behavior is produced by any algorithm.)
Strong and weak fairness of a step sp in a process
i are equivalent if the step cannot be disabled or
enabled by a step of another process. This is usually the case
if sp does not contain an await
statement. However, for reasons we'll see later, any execution
of a statement sp that does not change the state is not
considered to be an sp step for purposes of determining
weak and strong fairness. For example, the following statement
is completely equivalent to the enter statement above.
enter:+ if (sem = 1) {
sem := 0 ;
goto cs
}
else {
goto enter
} ;
An execution of this statement when sem equals 0 does not
change the state, so it is not considered an enter step
when interpreting the definition of strong fairness.
Replace the enter statement in algorithm
SemMutex with this version and check with TLC that the
algorithm still satisfies StarvationFree .
6 What Good is Liveness?
A liveness property asserts that something eventually happens.
It is sometimes argued that such a property is useless because it
doesn't say when that something will happen.
What good is knowing that a program eventually terminates if it might
run for a billion years?
The assertion that something happens in less than a billion years is
a safety property. You just have to look at the (finite)
part of the behavior that represents the first billion years of
execution to know that it hasn't happened.
We can model the real-time behavior of an algorithm with
PlusCal. (One way is to represent time by a variable and have
a process that does nothing but increase the variable's
value.) However, it's not worth the effort
in most applications.
Liveness properties are a useful compromise between just checking
safety properties like postconditions and mutual exclusion,
and checking how long it takes an algorithm to terminate or
a process to enter its critical section.
Understanding why an algorithm satisfies a liveness property
can help you figure out if it does things quickly enough.
Moreover, learning that an algorithm does not satisfy a liveness
property provides important information. An algorithm that never
terminates is not going to terminate in a billion years. When
implementing an algorithm, you should know if it satisfies a liveness
property only under a strong fairness assumption about a step.
One common use of time in systems is implementing timeouts to recover
from errors such as lost messages. Almost always, such timeouts
can be modeled as events that can happen at any time. Exactly
when a timeout occurs generally affects performance, but not
correctness. An error that occurs because a timeout happens at
the wrong time most likely indicates an error that can occur in the
implementation even when timeouts occur at the right time.
|