12 September 1996

This note informally describes the syntax and semantics of TLA+ down to, but not including, the level of expressions. Its main purpose is to introduce and explain the constructs for including modules that were introduced in August, 1996.

- Modules, Declarations, and Definitions
- Including One Module in Another
- Modules Inside Modules
- Local Definitions
- Higher-Order Operators
- Assumptions and Theorems
- Expression Levels
- A Closer Look at Instantiation
- Notes

A TLA+ specification is organized as a collection of modules. A module looks like this: (Note 1)

----------- MODULE DirectedGraphs ------------ body ---------------------------------------------- more body ==============================================The body consists of a sequence of statements, where a statement is a declaration, definition, assumption, or theorem. We will ignore assumptions and theorems for now. Horizontal lines like---------------------------------------------can appear between statements; they are purely decorative.A declaration statement looks like this:

CONSTANTS Node, EdgeIt adds to the module the declarations ofNodeandEdgeas constant symbols. TLA+ also allows variable symbols, identified by the keyword "VARIABLES". (Note 2) A declared symbol is a "free parameter" of the module.A definition statement looks like this:

NonEdge == (Node \X Node) \ EdgeAssuming thatEdgeandNodeare declared as above, this statements adds to the module the definition that defines the symbolNonEdgeto equal the expression(Node \X Node) \ EdgeWe will not worry about the meaning of this and other expressions.We can follow this definition statement with the definition statement

NonTrivialNonEdge == NonEdge \ {<<n, n>> : n \in Node}This defines the symbolNonTrivialNonEdgeto be the expression((Node \X Node) \ Edge) \ {<<$n, $n>> : $n \in Node}Observe that to obtain this definition, we replace the defined symbolNonEdgeby its definition, and we replace the bound variablenwith a new symbol$nthat cannot be declared or defined by the user.In this way, a definition always defines a symbol to equal an expression containing only declared symbols, bound variables that are different from any symbols typed by the user, and the built-in operators of TLA+ like

\Xand\.Symbols can also be defined to equal operators that take arguments. (Note 3) For example,

Nbrs(n, m) == <<n, m>> \in Edgedefines the symbolNbrsto be an operator that assigns to any expressionsexp_1andexp_2the expression<<exp_1, exp_2>> \in EdgeWe will denote this operatorLAMBDA $n, $m : <<$n, $m>> \in \ EdgeWe useLAMBDAonly to describe the semantics of modules; it is not an operator of TLA+. (As before, we replaced the bound variablesnandmby "untypable" symbols$nand$m.) (Note 4)TLA+ allows you to define prefix, infix, and postfix operators. For example, writing

n ?? m == <<n, m>> \in Edgedefines the infix operator??to be the same as the ordinary operatorNbrs. (Note 5)In a definition, the expression to the right of the "

==" can contain only TLA+ primitives, symbols that have already been defined or declared, and definition parameters (for example, the symbolsnandmin the definition ofNbrs). Circular definitions are not allowed. (Recursive definitions will be discussed elsewhere.)

One builds large hierarchical specifications by building a new module "on top of" old modules. One way to do this is with the

EXTENDSstatement at the beginning of the module. The statementEXTENDS Foo, Barsimply adds the declarations and definitions from modulesFooandBar(Note 6) to the current module. This has approximately (Note 7) the same effect as inserting the body of those modules in the current module.The other way to use an existing module is with the

INSTANCEstatement, which instantiates (substitutes for) the module's declared symbols. Suppose that moduleDirectedGraphsdeclares the symbolsNodeandEdge, and defines the symbolsNonEdgeandNbrs, as above. Consider the following module:------------------ MODULE SGraphs ------------------ CONSTANT S Edge == { <<m, n>> \in S \X S : m \in n } INSTANCE DirectedGraphs WITH Node <- S, Edge <- Edge ====================================================TheINSTANCEstatement adds to moduleSGraphsall definitions obtained from the definitions ofDirectedGrahsby substitutingSfor the symbolNodeand substituting the definition ofEdge(from moduleSGraphs) for the symbolEdge. Thus, theINSTANCEstatement adds the following definitions to moduleSGraphs:

NonEdgeis defined to equal(S \X S) \ { <<$m, $n>> \in S \X S : $m \in $n }Nbrsis defined to equalLAMBDA $n, $m : <<$n, $m>> \in { <<$m, $n>> \in S \X S : $m \in $n }Substitutions such as

Edge <- Edge, in which a symbol is substituted for itself, are quite common. We therefore introduce the convention that if no substitution for a declared symbolsof an instantiated module appears in theWITHclause, then there is an implicits <- s. (This means thatsmust already be declared or defined in the current module.) Thus, theINSTANCEstatement of moduleSGraphsabove can be writtenINSTANCE DirectedGraphs WITH Node <- SIf there are no explicit substitutions, the "WITH" is omitted.A module may need to use more than one instantiation of the same module. For example, we may want to use two different instances of the

DirectedGraphsmodule, with different substitutions for the parametersNodeandEdge. We can't let the same symbol have two different definitions, so we must rename the defined symbols. The statementFoo == INSTANCE DirectedGraphs WITH Node <- Sis the same as theINSTANCEstatement in moduleSGraphsabove except that it defines the symbolsFoo.NonEdgeandFoo.Nbrsinstead of the symbolsNonEdgeandNbrs. (Note 8)Sometimes we need to use a whole family of instantiations of the same module. We do this as indicated by the following example:

E(Set) == { <<T1, T2>> \in (SUBSET Set) \X (SUBSET Set) : T1 \subseteq T2 } DG(S) == INSTANCE DirectedGraphs WITH Node <- SUBSET S, Edge <- E(S)(T1andT2are bound variables in the first definition.) For any expressionexp, this definesDG(exp).NonEdgeto equal the expression obtained from the definition ofNonEdgein theDirectedGraphsmodule by substitutingSUBSET SforNodeand substituting the definition ofE(exp)forEdge. In other words,DG(exp).NonEdgeequals((SUBSET exp) \X (SUBSET exp)) \ { <<$T1, $T2>> \in (SUBSET exp) \X (SUBSET exp) : $T1 \subseteq $T2 }We can think ofDG(exp).NonEdgeas meaningDG.NonEdge(exp), whereDG.NonEdgeis defined to beLAMBDA $S : ((SUBSET $S) \X (SUBSET $S)) \ { <<$T1, $T2>> \in (SUBSET $S) \X (SUBSET $S) : $T1 \subseteq $T2 }TheINSTANCEstatement above similarly definesDG(exp_1).Nbrs(exp_2,exp_3)for any expressionsexp_1,exp_2, andexp_3. Again, we can think ofDG(exp_1).Nbrs(exp_2,exp_3)as meaningDG.Nbrs(exp_1)(exp_2,exp_3), whereDG.Nbrsis defined to beLAMBDA $S : LAMBDA $n, $m : <<$n, $m>> \in { <<$T1, $T2>> \in (SUBSET $S) \X (SUBSET $S) : $T1 \subseteq $T2 }To summarize:The keyword

EXTENDS M- Adds to the current module the declarations and definitions from module
M.

INSTANCE M WITH s_1 <- exp_1, ... , s_k <- exp_k- The
s_imust be the declared symbols of moduleM. (Ifk=0, we omit the "WITH". Omitting an explicit instantiation of a declared symbolsfrom theWITHclause is equivalent to adding the instantiations <- s.) ThisINSTANCEexpression represents a mapping that assigns to each defined symboldsa definitionds_def, defined as follows. LetEXP_ibe the expression obtained fromexp_iby replacing every symbol defined in the current module with its definition, and replacing bound variables by new symbols. We letds_defequal the expression obtained from the definition ofdsinMby replacing each declared symbols_iwithEXP_i.

- If this instance expression appears as a statement by itself, then each defined symbol
dsofMis defined in the current module to equalds_def.

- If this instance expression appears in the statement
IM(p_1, ..., p_j) == INSTANCE..., thenIM.dsis defined to equalLAMBDA $p_1, ... , $p_j : DS_deffor each defined symboldsofM, whereDS_defis obtained fromds_defby replacing each symbolp_iby the new symbol$p_i. We writeIM(e_1, ... , e_j).dsinstead ofIM.ds(e_1, ... , e_j). Ifj=0, we writeIM ==...andIM.dsinstead ofIM()==...andIM().ds.INSTANCEcan be followed by a comma-separated list of instantiations, as inINSTANCE Naturals, SequencesIf a moduleMhas no declared symbols, thenEXTENDS MandINSTANCE Mare equivalent.EXTENDSshould be used only for breaking up what is logically a single specification into smaller pieces. IfMis a general-purpose module,INSTANCE Mshould be used.

One can put one module inside another, as in

------------- MODULE Outer ------------ CONSTANT z ... ----------- MODULE IMod ----------- CONSTANT x, y ... ==================================== Inner(x,y) == INSTANCE IMod ...ModuleIModis like any module, except that it can use the declared symbolzand any other symbols declared or defined before it in moduleOuter. ModuleIModmay be used only inside moduleOuter. (Note 9)Submodules such as

IModare often used in the following way. Suppose we want to defineSpec == \exists x, y : InnerSpecwhereInnerSpecis some complicated formula involvingxandy. The obvious way to do this is:CONSTANT x, y ... InnerSpec == ... Spec == \exists x, y : InnerSpecHowever, the definition ofSpecis illegal because it violates the following syntactic rule of TLA+:(The purpose of this rule is to prevent confusion between bound variables and defined or declared symbols.) We could write

- A symbol that is already declared or defined cannot be used as a bound variable.
InnerSpec(x, y) == ... Spec == \exists x, y : InnerSpec(x,y)However,InnerSpecmay be defined in terms of a sequence of other defined symbols, each of which is defined in terms ofxandy. We would have to makexandyexplicit parameters of all those definitions. Instead, using the module structure in the example above, we can put the definition of InnerSpec in submoduleIModand defineSpec == \exists x, y : Inner(x, y).InnerSpec

It often happens that a module

Aextends modulesBandC, both of which extend moduleD. For example,Dmight declare symbols and make some definitions that are used by bothBandC. ModuleAthen obtains two copies of the definitions and declarations ofD--one obtained by extendingBand the other obtained by extendingC. TLA+ allows the same symbol to be declared or defined two or more times, if the declarations or definitions are the same. (Note 10)Adding more and more definitions with each level of module inclusion will eventually lead to unintended name clashes. We don't want to know what modules are included by

Naturals(the module that defines the natural numbers and operators like+), so we need a way to avoid name clashes with symbols defined by those modules. TLA+ provides local definitions as a way of avoiding such name clashes. Preceding a definition with the keywordLOCALmakes that definition local to the current module. For example, writingLOCAL Temp(x) == x + ydefinesTempin the current module. But the definition ofTempis not obtained by a module that includes the current module--whether withEXTENDSorINSTANCE. Similarly, writingLOCAL INSTANCE M WITH ...orLOCAL Temp(x) == INSTANCE M WITH ...makes all the definitions included fromMlocal to the current module.Declarations cannot be made local. Symbols are defined in terms of declared symbols, and it wouldn't make sense to make a symbol local if it appeared in a nonlocal definition. (Remember that the statement

df == expdefinesdfto be the expression obtained fromexpby replacing defined symbols with their definitions, so the definition ofdfhas no defined symbols.) Hence, one cannot precede a declaration or anEXTENDSstatement with "LOCAL". (Remember thatEXTENDS MaddsM's declared symbols to those of the current module. IfMdeclares no symbols, one can writeINSTANCE Minstead ofEXTENDS M.)When writing a general-purpose module, definitions included from other modules should be made local. For example, a general-purpose

Graphsmodule should include theNaturalsmodule byLOCAL INSTANCE NaturalsThat way, a module that includesGraphscan define+to mean something other than addition of natural numbers.

Operators can take operators as arguments. For example

Double(A, F(_,_)) == F(A,A)definesDoubleto be an operator that takes two arguments---the first of which is an expression, and the second of which is an operator that takes two arguments, both of which are expressions. For example,Double(3,+)equals3+3, andDouble({a},Nbrs)equalsNbrs({a},{a}).TLA+ does not allow an operator to take as an argument an operator that takes as an argument an operator. In other words, there can be at most two levels of parentheses to the left of the "

==". One cannot define an operator that is any "higher-order" thanDouble.The value of a TLA+ expression cannot be an operator; it must be a simple value. (A

LAMBDAexpression is one whose value is an operator, but one cannot writeLAMBDAexpressions in TLA+.)One can declare constant symbols to be operators. For example,

CONSTANT Foo(_, _)declaresFooto be an operator that takes two arguments. Since the value of an expression cannot be an operator, the declared operatorFoocan be instantiated only by an operator that has the same number of arguments---for example, by writingINSTANCE ... WITH Foo <- +One cannot declare variable symbols to be operators.

A module can contain assumptions and theorems, of the form

ASSUME expandTHEOREM exp, whereexpis an expression. The expressionexpcan contain symbols declared or defined anywhere in the module.For simplicity, suppose that module

Mcontains only the single assumptionASSUME aand the single theoremTHEOREM t. (The generalization is obvious.) LetAandTbe the expressions obtained fromaandtby replacing all defined symbols with their definitions, so the free variables ofAandTare declared symbols. We make the syntactic requirement thatAcontain only constant symbols. (Note 11) ModuleMisvalidiff the formulaA => Tis valid. Writing a module asserts that the module is valid, so a theorem represents a proof obligation for the writer of the module.When module

Mis included with anEXTENDS Mstatement, its assumptions and theorems are added to those of the current module. When moduleMis instantiated with anINSTANCE Mexpression, only its definitions are added to the current module, not its assumptions or theorems (or declared symbols).Suppose

Mis valid, soA => Tis a valid formula. Suppose that the instantiationINSTANCE M WITH s_1 <- exp_1, ... , s_k <- exp_kis syntactically correct, meaning that every declared symbol ofMis instantiated with an expression all of whose symbols are defined or declared. Substitution preserves validity, so the formula obtained fromA => Tby replacing eachs_iwithexp_iis valid.One often write modules with assumptions but no theorems. These assumptions serve no logical function, since they are not used to prove anything (except perhaps if the module is extended by another module). However, they serve as useful comments to the reader about the values one expects these symbols to assume. For example, one might add to module

DirectedGraphsthe assumptionASSUME Edge \subseteq Node \X Nodewhich indicates that one expectsEdgeto be a set of ordered pairs of elements from the setNode. We expect the assumption to be satisfied when the module is instantiated, but we do not make this a formal requirement. (Note 12)

TLA extends "ordinary math" (first-order logic plus set theory) with declared variable symbols and a few nonconstant operators. The nonconstant operators of TLA+ are

~> (leads-to) ' (prime) -+-> (while) ENABLED \cdot (action composition) UNCHANGED [...]_ (e.g., [A]_f) \EE (temporal \exists) <<...>>_ (e.g., <<A>>_f) \AA (temporal \forall) [] <> WF SFTo define the semantics of TLA+, we take the following six to be primitive and we define the rest in terms of them:' ENABLED \cdot [] -+-> \EETLA places certain syntactic restrictions on the use of these operators. These restrictions are described in terms oflevels. Declared symbols and primitive TLA+ operators are assigned one of four levels, listed below in increasing order:The level of an expression is defined to be the level of its highest-level subexpression, except that

- Constant Level
- Contains declared constant symbols and constant-level TLA+ operators.

- State Level
- Contains declared variable symbols and the
ENABLEDoperator.

- Action Level
- Contains the operators
'and\cdot.

- Temporal Level
- Contains the operators
[],-+->, and\EE.

ENABLED Ais a state-level expression even whenAis an action-level expression. Note that in determining the level of a subexpression, bound variables are considered to have constant level except that bound variables introduced by the temporal quantifiers\EEand\AAare state-level symbols.TLA places the following syntactic restrictions on expressions.

These level restrictions place restrictions on the arguments of defined operators. For example, consider the operator

- The expression
exp'is allowed only ifexpis a state-level or contant-level expression.

- A temporal-level expression is either

- a constant-level or temporal-level operator applied to constant-level, state-level, or temporal-level expressions.

- an expression having one of the the following forms
[][A]_f <><<A>>_f WF_f(A) SF_f(A)whereAhas at most action level andfhas at most state level.Sdefined by:S(U, V, x) == U => Enabled (V \/ x')The expressionS(e_1, e_2, e_3)is legal iff expressione_2has at most action level, ande_3has at most state level. The level of the complete expression is the maximum of the level ofe_1and state level. (Note 13)A constant-level module is one that has no declared variable symbols and uses only constant-level operators. Constant-level modules are used for defining data structures and for describing "ordinary math".

We make the following rules for instantiation (by an

INSTANCEexpression):These restrictions ensure that the instantiated modules satisfy the syntactic level restrictions of TLA. They are also needed to ensure that the instantiation of a valid module is valid. For example, the TLA formula

- Except when instantiating a constant-level module, a constant symbol can be instantiated only by a constant-level expression. (Arbitrary instantiations of declared symbols are allowed when instantiating a constant-level module.)

- Variable symbols can be instantiated only by constant-level or state-level expressions.
[][c'=c]_cis valid (true for all behaviors) ifcis a declared constant symbol. Substituting a variable forcwould produce an invalid formula (one that is false for some behaviors).

We have stated that instantiation preserves validity. If

Fis defined in moduleMto equal a valid formula, and we writeI == INSTANCE M WITH ...thenI.Fis a valid formula. Instantiation is substitution, and in the presence of quantifiers, substitution preserves validity only if it is defined properly. In particular, substitution must be defined so capture of free variables by quantifiers is not allowed. For example, consider the valid formula\exists u : u /= v. Naive substitution ofuforvin this formula would yield the invalid formula\exists u : u /= u. Validity is not preserved because the free symbolu, which is being substituted forv, is captured by the quantifier\exists.The problem of capture of free variables by explicit quantifiers is avoided in TLA+ by renaming bound variables. The module

---------------- MODULE M ---------------- CONSTANT v F == \exists u : u /= v ==========================================definesFto be the formula\exists $u : $u /= v. The statementI == INSTANCE M WITH v <- uthen definesI.Fto be the valid formula\exists $u : $u /= u.Some nonconstant operators of TLA+ contain implicit quantification--most notably, the operator

ENABLED. Suppose moduleMis---------------- MODULE M ---------------- VARIABLE u, v F == ENABLED ((u' = u) /\ (v' /= v)) ==========================================ThenFis a valid formula. Now consider the statementI == INSTANCE M WITH u <- x, v <- xA naive substitution would makeI.Fthe formulaENABLED ((x' = x) /\ (x' /= x))which is equivalent toFALSE. The problem is that, within anENABLEDexpression, primed variables are really bound, so this naive substitution results in variable capture. The meaning of formulaFis really\exists $uprime, $vprime : ($uprime = u) /\ ($vprime = v)To ensure that instantiation preserves validity, we make the following rule for theENABLEDoperator:For example,

- When instantiating a module, every declared variable symbol that occurs primed in an
ENABLEDexpression is renamed to a new variable symbol before substituting for the declared symbols.------------------ MODULE M ------------------ VARIABLE u G(v, A) == ENABLED (A \/ ({u, v}' = {u, v})) H == G(u, u' /= u) ==============================================definesGto equalLAMBDA $v, $A : ENABLED ($A \/ ({u, $v}' = {u, $v}))and definesHto equalENABLED (u' /= u) \/ ({u, u}' = {u, u})Before instantiatingM, the definition ofGis changed to toLAMBDA $v, $A : ENABLED ($A \/ ({$$u, $v}' = {u, $v}))and the definition ofHis changed toENABLED ($$u' /= u) \/ ({$$u, $$u}' = {$$u, $$u})(The two "$"s in the name$$xare meant to indicate that it is a new bound variable symbol.) Note that the substitution foru'in the definition ofGoccursafterthat definition is used in definingH.As another example, consider the module

----------------- MODULE MM ------------------ VARIABLES u, v A == (u' = u) /\ (v' /= v) B(d) == ENABLED d C == B(A) ==============================================and the instantiationVARIABLE x I == INSTANCE MM WITH u <- x, v <- xThis instantiation yieldsI.A = (x' = x) /\ (x' /= x) I.B = LAMBDA $d : ENABLED $d I.C = ENABLED (($u' = x) /\ ($v' = x))Note thatI.Cis not equivalent toI.B(I.A). In fact,I.Cis valid whileI.B(I.A)is equivalent toFALSE.The other TLA+ primitives that have implicit quantification are

\cdot(action composition) and-+->(while). The rule for\cdotis similar to that forENABLED, where in the expressione_1 \cdot e_2, primed variables ine_1and unprimed variables ine_2are bound. The rule for-+->is that, before instantiating a module, each instance ofP -+-> Qis replaced by an equivalent formula containing explicit quantification. The precise formula can be found elsewhere.

Note 1- One can also write
BEGIN MODULE DirectedGraphs body ENDThe precise ASCII syntax of module delimiters and decorative horizontal lines has not been determined yet.

Note 2- The "
S" at the end of "VARIABLES" and "CONSTANTS" is optional. The keyword "PARAMETER(S)" means the same as "CONSTANTS". If you are using TLA+ for ordinary mathematics, with no actions or temporal formulas, then all the parameters are constants.

Note 3- Operators that take arguments are different from functions. Functions, and their definitions, will be discussed elsewhere.

Note 4- The precise rule for turning a definition statement into a definition is: First replace all defined symbols to the right of the
==by their definitions; "beta-reduce"LAMBDAexpressions when possible--for example, reduce(LAMBDA $a : <<$a, $a>> \in S)(Y+Z)to<<Y+Z, Y+Z>> \in S--and finally, replace all bound symbols, including the parameters of the definition (the symbolsnandmin the definition ofNbrs) by "untypable" symbols that do not already appear in the expression.

Note 5- TLA+ provides large, fixed sets of infix and postfix operator symbols; they will be described elsewhere.

Note 6- The translation from module names to modules is outside the scope of this document. Presumably, a tool will have rules for finding named modules. We suggest that a module named
Nappear in a file namedN.tla. The directory in which this file is to be found will be system dependent.

Note 7- If module
Foomakes local definitions that clash with definitions in the current module, then the body ofFoocannot be inserted into the current module. This is the only reasonEXTEND Foois not equivalent to inserting the body ofFooin the current module.

Note 8- If module
DirectedGraphshad defined an infix operator??, then this statement definesFoo.??to be an infix operator. The expressiona Foo.?? blooks strange, but the alternatives seem worse. In practice, one does not define infix operators in modules that one wants to use with this kind of renaming.

Note 9- Submodules provide a scoping mechanism for module names, which in principle could be used to define the mapping between module names and modules. For example, the modules written by user Jones of company XYZ could be submodules of a module named
Jones, which is a submodule of a module namedXYZ. However, in practice, such name scoping will be provided by directory structures and file search paths.

Note 10- Two declarations of a symbol
sare the same if they both declaresto be a constant or both declare it to be a variable. Two definitions are the same if they have parse trees that differ only in the names of bound variables.

Note 11- In general, we should define validity of
Mto meanA |= T, which means that the validity ofAimplies the validity ofT. IfAis a constant formula, thenA |= Tis equivalent to the validity ofA => T. For writing specifications, this special case is all we need, and restricting ourselves to it allows us to avoid introducing the semantic operator|=.

Note 12- It is tempting to include an instantiated module's assumptions as theorems, so they become proof obligations of the current module. However, it may be impossible to discharge those proof obligations in the current module. For example, suppose we were specifying a graph algorithm and we wrote
VARIABLES N, E INSTANTIATE DirectedGraphs WITH Node <- N, Edge <- EwhereNandErepresent variables of the algorithm. The instantiated assumption of theDirectedGraphsmodule isE \subseteq N \X N. It is impossible to prove this formula because it is not valid;NandEare flexible variables, so they can assume any values. What we do expect to be able to prove is(*) Spec => [](E \subseteq N \X N)whereSpecis the TLA formula that specifies the algorithm. Formally, there is nothing special aboutSpec; it is just one of many defined symbols. So, there is no reason to take (*) as the proof obligation associated with the instantiation. Moreover,Specmight not even be defined in the current module, but in some other module that extends the current one.

Note 13- For simplicity, we always consider
Enabled Ato have state level, even in those special cases when we could determine that it is a constant formula.