Mechanically Checked Safety Proof of a Byzantine Paxos Algorithm

Leslie Lamport

Last modified on 25 July 2020

This web page links to the formal PlusCal specifications of the algorithms, and the machine-checked TLA+ proofs of their correctness, that are presented informally in the paper Byzantizing Paxos by Refinement.  That paper describes an algorithm BPCon that abstracts and generalizes the consensus algorithm at the heart of the algorithm described in the seminal paper Practical Byzantine Fault Tolerance and Proactive Recovery by Miguel Castro and Barbara Liskov (ACM Transactions on Computer Systems 20, 4 (2002), pages 398-461).  BPCon also admits two other implementations: a simple one that employs digital signatures and a previously unpublished one that, like the Castro-Liskov algorithm, relies only on message authenticators.  It is derived by Byzantizing algorithm PCon, a variant of the classic Paxos consensus algorithm first published in The Part-Time Parliament by Leslie Lamport, (ACM Transactions on Computer Systems 16, 2 (1998), pages 133-169).  (Paxos also appeared without proof in Brian Oki's Ph.D. thesis.)  Although never presented in this form, PCon was essentially used as part of two previous algorithms: Vertical Paxos and an unpublished (but patented) version of Cheap Paxos.

What We've Done       [show]

We have proved the safety properties of algorithm BPCon by showing that its safety specification refines (implements) that of algorithm PCon. The proof has been checked by TLAPS, the TLA+ proof system.  The initial version of the proof was written and checked in 2011.  Since then, improvements to TLAPS have allowed the proof to be shortened---mainly by Stephan Merz.

We have also written the following two additional specifications. 

  • Consensus: A trivial, high-level specification of consensus.
  • Voting: A high-level consensus algorithm that describes ballots and voting that underlie the classic Paxos consensus algorithm.
We have proved that the Voting algorithm refines the Consensus specification (including liveness).  The proofs of the safety properties have been completely checked by TLAPS, but some steps in the liveness proofs have not because TLAPS cannot yet check them.

Algorithm PCon refines the Voting algorithm.  The safety part of this refinement, as well as of the refinement of Consensus by Voting, has been checked by the TLC model checker.

The Specifications       [show]

Our specifications are all written as algorithms in the PlusCal algorithm language, which are automatically translated into TLA+.  Click here for an explanation of the PlusCal code. This code, and its translation, specify only safety.  The liveness properties of specifications Consensus and Voting are written directly in TLA+.  (Constructs for specifying liveness were added to PlusCal after these specifications were written, but they are not general enough to specify the rather subtle liveness assumptions needed by the algorithms.)  We have not written formal liveness properties for BPCon and PCon.  The specifications and their pretty-printed versions are in the following files.
Consensus       Consensus.tla Consensus.pdf
Voting   VoteProof.tla Voteproof.pdf.
PCon   PConProof.tla PConProof.pdf
BPCon   BPConProof.tla          BPConProof.pdf
They are also all in this compressed zip file.

All the specifications have extensive comments.  If you are interested primarily in the specifications and proofs, then you should read them in order, starting with the Consensus specification.  If you are primarily interested in Byzantine Paxos, unless you are already familiar with TLA+ and PlusCal, you should still read the Consensus specification first.  You might then want to go directly to BPConProof and refer back to the other specifications as needed.

Verifying the Refinements       [show]

Each algorithm refines (implements) the previous one under a suitable refinement mapping.  Before attempting any proofs, we checked the safety part of each refinement except the last with the TLC model checker, using large enough models that we felt confident that there were no coding errors.  Our understanding of these algorithms is good enough that we were confident that they were basically correct before writing any proofs.  We also model checked BPCon to find simple errors, but TLC cannot check it on large enough models to give any confidence in correctness of the algorithm.

The final refinement step, that BPCon refines PCon, has been proved and the proof completely checked by TLAPS, except that a few simple mathematical facts about finite sets are assumed without proof--for example, that a finite, non-empty set of integers contains a maximal element.  These assumptions were checked with TLC to avoid possible errors in their statements.  Theorems in library modules developed since the proofs were written would now make it easy to prove those statements.

We have written a proof of the first refinement step, that Voting refines Consensus, including the liveness properties.  All safety properties were completely checked, except that we assumed a few of the same kind of simple mathematical facts as in the refinement proof of BPCon, where possible checking them with TLC.  It would be nice to complete the entire refinement sequence by proving that PCon refines the safety part of the Voting spec.  However, there is little motivation to do that because: (i) we already believe that PCon is correct, (ii) its proof would be less of a test of our ability to write and machine check such proofs than the refinement proof of BPCon, and (iii) a proof of correctness of ordinary Paxos has already been written and checked with TLAPS.

The proof that the Voting spec implements the liveness property of Consensus has not been completely checked, since TLAPS cannot yet reason about TLA's ENABLED operator or do non-propositional temporal logic reasoning. 

This page can be found by searching the Web for the 23-letter string uidlamportbyzpaxosproof.  Please do not put this string in any document that could wind up on the web--including email messages and Postscript and Word documents.  You can refer to it in Web documents as the string obtained by removing the - from uid-lamportbyzpaxosproof.