1 Safety versus Liveness

2 Alternate

3 1BitMutex

4 Starvation Freedom

5 Semaphore Liveness

6 What Good is Liveness?

Prev:  Session 8

Next:  Intermezzo 1

Contents
 ↑

top

 ↑

top

 ↑

top

 ↑

top

 ↑

top

 ↑

top

 ↑

top

 ↑

top

 ↑

top

 ↑

top

 ↑

top

Session 9   Liveness

Leslie Lamport

Last modified on Mon 19 August 2024 at 9:15:20 PST by lamport -->


1  Safety versus Liveness

Invariance of MutualExclusion is trivially satisfied by an algorithm in which no process ever enters its critical section.  Invariance belongs to a class of properties called safety properties.  A safety property is one that, if it is not satisfied by an execution, then you can tell that it's not satisfied by looking at a finite portion of the execution—even if the execution is infinite.  A formula is not invariant in an execution iff it is not satisfied in some state of the execution, and you can tell that by looking just at the execution up to and including that state.

Termination is an example of a different class of properties called liveness properties.  A liveness property is one for which you have to look at the entire execution, which may be infinite, to determine that it is not satisfied.  Termination means that a terminating state is eventually reached.  For an infinite execution, to determine that a terminating state isn't reached just by looking at the execution, you have to look at the entire execution.  Knowing that it's an execution of a certain algorithm, you may be able to infer that it will never reach such a state.  But just by observing the execution, without knowing what it's an execution of, you can't tell that it does not reach a state satisfying some condition without seeing the complete execution.

If you look at the translation of algorithm Add2 or the other algorithms in Session 7, you will see that they define the property Termination by

Termination  ==  <>(\A self \in ProcSet: pc[self] = "Done")
where ProcSet is defined to be the set {3,-7} of process ids.  This defines Termination to be a temporal formula, meaning it is true or false of a sequence of states.  Mathematically, a state is an assignment of values to variables, and a sequence of states is called a behavior.  We describe the execution of an algorithm as a behavior.  But when we discuss temporal formulas, it's important to remember that a behavior just means a sequence of states, not necessarily one generated by executing an algorithm.

Formula Termination is true of a behavior iff its subformula

\A self \in ProcSet: pc[self] = "Done"
is true of at least one state of the behavior.  For a behavior allowed by a multiprocess PlusCal algorithm, this formula can be true for at most one state—namely, the final state.  But there are lots of behaviors not allowed by any PlusCal algorithm for which the formula is true for more than one state.  There are also lots of behaviors in which Termination is not true, but we have to look at the entire behavior to be sure that it's one of them; so Termination is a liveness property.

Formula Termination has the form <>P where P is a formula that is true or false of a state.  A formula that is true or false of a state is called a state predicate.  A state predicate is just an ordinary Boolean-valued formula that can contain variables but no primed variables and no temporal operators like <>.  For any state predicate P, the formula <>P is true of a behavior iff P is true of at least one state of the behavior.  The operator <>, pretty-printed ◇, is read eventually.

2  Alternate

The operator <> by itself just asserts that something happens at least once.  For algorithm Alternate of Session 8, we want processes to enter the critical section not just once but repeatedly.  We want the code to say that the algorithm keeps taking steps, and we want to check that processes keep entering the critical section.  Nothing runs forever; but just as the infinite set of all integers is a convenient abstraction even though all the integers can never be written, allowing a behavior of algorithm Alternate to consist of infinitely many states is a convenient abstraction. 

We want a behavior of algorithm Alternate to be infinite, which means each process must be in its critical section infinitely many times.  For process 0 to be in its critical section infinitely many times means that formula pc[0]="cs" must be true on infinitely many states of a behavior of the algorithm.  The temporal property asserting that pc[0]="cs" is true on infinitely many states of a behavior is written []<>(pc[0]="cs").  The symbol [] is pretty-printed ◻ and is read always.  If we think of the sequence of states of a behavior representing the state of the system at successive times, then the formula []<>P, which asserts that P is true in infinitely many states of the behavior, asserts that P is true at infinitely many times.  For this reason, []<> is usually read as infinitely often.  However, it's a mistake to think that a step from one state to the next in a behavior necessarily represents time advancing.

I haven't explained what []<>P asserts about a finite behavior.  An easy way to think about what a temporal formula means for finite behaviors is to pretend that a finite behavior is an infinite behavior in which the final state is repeated forever.  This pretense tells us that []<>P is true for a finite behavior iff P is true of the last state of that behavior.  (Ideas introduced in Session 11 imply that this is more than just a convenient pretense.) 

The liveness property we want algorithm Alternate to satisfy is that each process is in its critical section infinitely often.  Since Procs is the set of processes, this is expressed by this property:

\A i \in Procs : []<>(pc[i] = "cs")
It is a liveness property because we have to look at the entire behavior to know if it is not the case that pc[i] = "cs" is true in infinitely many states of the behavior.

For any algorithm, we have to add a fairness requirement to require that it does something.  The fairness requirement most often added for multiprocess algorithms is weak fairness for each process.  Weak fairness for a process roughly means that the process must keep taking steps if it can.  I will explain precisely what it means below.  We require weak fairness of a process by putting the keyword fair (followed by at least one space) before the keyword process.  For a process set, this means weak fairness is required for each process in the set. 

Add the weak fairness requirement to the process set of algorithm Alternate.  Add to a model for the algorithm the liveness property above to the Properties subfield of the What to check? field of the Model Overview tab.  Run TLC to check that the algorithm satisfies this property.

1BitMutex

A mutual exclusion solution should not require that a process enter its critical section infinitely often.  A process should be allowed to halt at any time in its noncritical section.  Consider algorithm 1BitMutex of Session 8.  Halting in its noncritical section means that control in a process remains forever at label ncs.  Weak fairness of a process would not allow that.  We don't want to allow a process to remain forever at any other label, only at ncs.  I will call this requirement weak fairness of the process except at ncs.  It is specified by a fair process declaration with the modifier - added to the label ncs.  The beginning of the process becomes:

  fair process (P \in Procs) {
   ncs:- while (TRUE) {
           skip ;
The modifier - comes after the : that follows the label.  (There can be spaces before and/or after the :.)  Open your specification containing algorithm 1BitMutex and make this modification to it. 

Some statements of the mutual exclusion problem allow a process to remain forever in its critical section.  In that case, we would also add the - modifier to the label cs.  However, that complicates the liveness properties that a solution should have, so we will require that a process with control at cs eventually executes that statement, leaving the critical section.

Algorithm 1BitMutex ensures that if process 0 begins executing the entry code, then it will enter the critical section.  We can state that more precisely as:  If pc[0] equals "enter", then it eventually equals "cs".  This condition is expressed in TLA+ with the temporal operator ~>, pretty printed and read leads to.  For state predicates P and Q, the formula P ~> Q is true of a behavior iff for every state of the behavior in which P is true, Q is true in that state or a subsequent state.  The liveness guarantee that the algorithm ensures for process 0 is:

(pc[0] = "enter") ~> (pc[0] = "cs")
Open a model for the algorithm and add this formula to the Properties to be checked on the Model Overview page.  Run TLC to check that the algorithm satisfies this property.

The algorithm does not satisfy the corresponding condition for process 1.  Have TLC check that property:

(pc[1] = "enter") ~> (pc[1] = "cs")
Examine the error trace reported by TLC to see why 1BitMutex doesn't satisfy the property.

Algorithm 1BitMutex satisfies Dijkstra's third requirement.  This requirement can also be expressed as follows:

Whenever there is some process in the entry code, some process must then or later be in the critical section.
This condition is confusingly called deadlock freedom.  It actually prohibits deadlock and livelock, so I will give it the awkward name DeadAndLivelockFree.  If Procs is the set of processes, it is defined by:

DeadAndLivelockFree == 
  (\E i \in Procs : pc[i] = "enter") ~> (\E i \in Procs : pc[i] = "cs")
Add this definition to your module, add DeadAndLivelockFree to the temporal properties checked by the model, and have TLC check that it is satisfied by 1BitMutex.  Also check that the N-process version you wrote in Exercise 4 of Session 8 satisfies the property.

4  Starvation Freedom

In Session 8, an algorithm was defined to be starvation free if any process that begins executing the entry code eventually enters the critical section.  For our mutual exclusion algorithms, where Procs is the set of processes, this property is expressed in TLA+ as:

StarvationFree == \A i \in Procs : (pc[i] = "enter") ~> (pc[i] = "cs")
Modify Peterson's algorithm to make each process weakly fair except at ncs, and have TLC check that it satisfies property StarvationFree.  Do the same for the N-process generalization in Exercise 5 of Session 8.

5  Semaphore Liveness

Algorithm Add2FGSem of Session 7 uses a semaphore variable sem to implement mutual exclusion, where the critical section consists of statements ab1 and ab2.  We can obviously use a semaphore to implement a mutual exclusion algorithm like this:

--algorithm SemMutex {
   variable sem = 1 ; 
   fair process (P \in Procs) {
     ncs:- while (TRUE) {
             skip ;
      enter: await sem = 1 ;
             sem := 0 ;
         cs: skip ;
       exit: sem := 1 ;
           }
   }
}
where Procs can be any finite set. 

Use TLC to check that this algorithm satisfies mutual exclusion and also the property DeadAndLivelockFree.  It does not satisfy StarvationFree.  Have TLC check starvation freedom and examine the error trace that it produces.  TLC finds a behavior that reaches a point at which one or more processes remain forever at the enter statement, while one or more other processes repeatedly enter and exit the critical section.  When I just tried it, process 0 was one process stuck forever at enter.

We say that the enter step of process 0 is enabled in a state iff it is possible for 0 to take an enter step starting in that state.  It is enabled in states in which pc[0] equals "enter" and sem equals 1.  (Even though code implementing statement enter might keep reading the semaphore when it equals 0, that is not considered an enter step.)  The behavior found by TLC contains infinitely many states in which process 0's enter step is enabled.  (More precisely, it contains a finite number of such states that occur infinitely many times.)  However, the behavior also contains infinitely many states in which that step is not enabled.  Weak fairness only requires a process 0 to take an enter step if that step will remain continuously enabled until it takes that step.  The fair process declaration requires that for every label lbl not modified by -, the step lbl of its process (or each of its processes for a process set) is weakly fair in every behavior of the algorithm. 

A step of a process is strongly fair in a behavior iff there is no point in the behavior after which the step is enabled in infinitely many states but no such step occurs in the behavior.  In the behavior found by TLC, at some point there are infinitely many states in which an enter step of process 0 is enabled, but such a step is never performed.  Strong fairness of that step would forbid such a behavior.  To require strong fairness of an sp step in a fair process, we modify the sp label by adding a + after it.  Modify algorithm SemMutex by changing the enter statement to:

     enter:+ await sem = 1 ;
             sem := 0 ;
Run TLC and check that that the algorithm now satisfies StarvationFree.

Strong fairness implies weak fairness, since if a step is enabled from some point on, then it is enabled in infinitely many states.  (Remember we are pretending that terminating executions are represented by behaviors in which the last state is repeated forever.)  Strong fairness is a stronger requirement than weak fairness because it requires the step to eventually be taken if it becomes enabled infinitely often, even if it also becomes disabled infinitely often.  Both weak and strong fairness are liveness properties because we can tell if they are not satisfied by a behavior only by examining the complete behavior.  (Remember that for determining liveness, we don't assume that the behavior is produced by any algorithm.)

Strong and weak fairness of a step sp in a process i are equivalent if the step cannot be disabled or enabled by a step of another process.  This is usually the case if sp does not contain an await statement.  However, for reasons we'll see later, any execution of a statement sp that does not change the state is not considered to be an sp step for purposes of determining weak and strong fairness.  For example, the following statement is completely equivalent to the enter statement above.

        enter:+ if (sem = 1) {
                  sem := 0 ;
                  goto cs 
                } 
                else {
                  goto enter
                } ;
An execution of this statement when sem equals 0 does not change the state, so it is not considered an enter step when interpreting the definition of strong fairness.  Replace the enter statement in algorithm SemMutex with this version and check with TLC that the algorithm still satisfies StarvationFree.

6  What Good is Liveness?

A liveness property asserts that something eventually happens.  It is sometimes argued that such a property is useless because it doesn't say when that something will happen.  What good is knowing that a program eventually terminates if it might run for a billion years? 

The assertion that something happens in less than a billion years is a safety property.  You just have to look at the (finite) part of the behavior that represents the first billion years of execution to know that it hasn't happened.  We can model the real-time behavior of an algorithm with PlusCal.  (One way is to represent time by a variable and have a process that does nothing but increase the variable's value.)  However, it's not worth the effort in most applications. 

Liveness properties are a useful compromise between just checking safety properties like postconditions and mutual exclusion, and checking how long it takes an algorithm to terminate or a process to enter its critical section.  Understanding why an algorithm satisfies a liveness property can help you figure out if it does things quickly enough.  Moreover, learning that an algorithm does not satisfy a liveness property provides important information.  An algorithm that never terminates is not going to terminate in a billion years.  When implementing an algorithm, you should know if it satisfies a liveness property only under a strong fairness assumption about a step.

One common use of time in systems is implementing timeouts to recover from errors such as lost messages.  Almost always, such timeouts can be modeled as events that can happen at any time.  Exactly when a timeout occurs generally affects performance, but not correctness.  An error that occurs because a timeout happens at the wrong time most likely indicates an error that can occur in the implementation even when timeouts occur at the right time.