Lecture 3: Transition Systems

We concluded Lecture 2 by seeing some limitations of denotational semantics, and in particular, we saw that they struggle with non-terminating programs. We're going to see a solution to this problem in the form of operational semantics, which define the meaning of programs as transitions between states. But before we get there, we need to build up some infrastructure about transition systems, the foundation we'll use to define operational semantics. (A lot of the content in this lecture is borrowed from UW's excellent CSE 505 course—thank you!).

Definition of a transition system

Transition systems give us a way to model programs in terms of small steps the program can take through a space of states the program can be in. We'll refine this notion shortly, but for now a rough analogy is to think of a state as "all the variables in the program". For example, consider this program:

x = 5
while True:
    x = x + 1

We can model each iteration of this loop as a single step, and the state of the program to be the value of variable x. So this program starts in state 5, then steps once to state 6, steps again to state 7, and so on. In fact, this program continues stepping like this forever. This is a contrast to the challenges we had earlier with denotational semantics—even though this program never terminates, we can still say things about its behavior at each "step".

What we just did in words is define a transition system for this program. More formally, a transition system is three things:

A set of states $S$
A set of initial states $S_0$ such that $S_0 \subseteq S$
A transition relation $\rightarrow$ over $S \times S$

A quick note on notation: a binary relation is just a set of pairs of elements. For example, inequality $a < b$ is a binary relation over the natural numbers; the relation is the set $\{ (a, b)\: |\: a < b \} \subset \mathbb{N} \times \mathbb{N}$. In our case, a transition relation is just a subset of the set $S \times S$ that relates states to other states. We write $s_1 \rightarrow s_2$ to mean that the pair $(s_1, s_2)$ is in this subset, and we read this as "$s_1$ steps to $s_2$".

Now we can define our informal transition system for the above program a little more formally: the set $S$ of states is $\mathbb{N}$ (all values that $x$ could take), the set $S_0$ of initial states is the singleton $\{ 5\}$, and the transition relation is $\rightarrow \; = \{ (n, n+1) \: | \: n \in \mathbb{N} \}$. Notice that this definition is not especially precise; we can see by inspection that the program can never reach the state 4, for example, but $4 \in S$. Similarly, $(4, 5) \in \; \rightarrow$ but we know the program can never make that step. That's OK! Defining these problems away would be fairly easy in this case, but for general programs it's very hard to write down precise definitions for these sets—we would have to look at the program and write down all possible values every variable could take, and in what combinations. This imprecision won't affect the correctness or precision of our transition system when looked at as a whole, because we're going to build a notion of reachability.

Just as we've discussed before with semantics, we are implicitly making a choice here when modeling the states of this program. You probably know that when executing a program like this one on a real computer, there's actually many more fine-grained states, tracking changes to things like the program counter or to individual bytes of memory. We've chosen to ignore those lower-level notions of state because they won't be relevant for the uses we have in mind, but there are certainly other use cases that need this level of detail. Conversely, there are coarser-grained notions of state that are valid but wouldn't be useful to us here, like just tracking what function or module is currently executing. We often refer to this choice of modeling state in PL as state abstraction—only capturing enough program state to get the information we need. Choosing an appropriate state abstraction is part of the art of semantics.

Reachability in a transition system

How can we use this transition system to reason about programs? One very useful notion is to talk about the states our transition system can ever arrive at, as those are the states our program might reach during execution. This idea will let us state and study properties about our program's execution, not just its final result.

First, let's extend our notation a little bit to talk about paths through the transition system rather than single steps. We defined the transition relation $s_1 \rightarrow s_2$ to mean that $s_1$ steps to $s_2$ in exactly one step. We can introduce a new notation $s_1 \rightarrow^* s_2$ to mean that $s_1$ steps to $s_2$ in any number of steps. Importantly, this includes zero steps: $s_1 \rightarrow^* s_1$ is always true, but whether $s_1 \rightarrow s_1$ is true depends on the particular transition system. We sometimes read $a \rightarrow^* b$ as "$a$ step stars to $b$", or as "$a$ can reach $b$". (If you want some big PL words, $\rightarrow^*$ is the reflexive transitive closure of the relation $\rightarrow$.)

For example, in our transition system above, we know that $5 \rightarrow 6$ and $6 \rightarrow 7$, and so we know that $5 \rightarrow^* 7$ (we can step from 5 to 7 in two steps), but $5 \not\rightarrow 7$ as we cannot reach 7 in exactly one step from 5. Note that we haven't yet disposed of the imprecision problem above: a proposition like $1 \rightarrow^* 4$ is true, even though neither state is reachable in our actual program, as all that $\rightarrow^*$ says is that starting from state 1 it is possible to reach 4 (in three steps, in fact).

But now we have the tools we need to define reachability: a state $s$ is reachable if there exists an initial state $s_0 \in S_0$ such that $s_0 \rightarrow^* s$. Reachability is how we move beyond the imprecision in our state abstraction: for example, the state 3 is not reachable in our transition system, because even though there exist some states such that $s \rightarrow^* 3$, there does not exist any initial state $s$ where this is true. On the other hand, the state 7 is reachable in our transition system, because $5 \rightarrow^* 7$ and $5 \in S_0$.

Invariants in transition systems

Let's use this new notion of reachability to reason about our program. You probably already have some informal notion of an "invariant" of a program: a property that is always true while the program (or some piece of the program) executes. An invariant can written as an assertion about a program state. For example, let's extend our program from earlier with an invariant:

x = 5
while True:
    assert x >= 5
    x = x + 1

Hopefully it is clear that this assertion is in fact an actual invariant of this program. We also learned last time that denotational semantics wouldn't be able to help us prove this invariant, as they can only talk about the final state of the program, but here we want to know about the intermediate states.

To be more formal, a property of a transition system is just a set of states $P \subseteq S$ (or equivalently, a predicate over states). Then an invariant of a transition system is a property $I$ such that $R \subseteq I$, where $R$ is the set of reachable states of the system. In other words, an invariant is a property is that is true in every reachable state of the program. It might be true in other states, too, but that doesn't matter for invariance; we care only about the reachable states.

More common PL terminology: we sometimes call the set of states that satisfy an invariant the set of "safe" states, as often invariants are properties about something that must be true for the program to be "good". Similarly, one interesting kind of invariant is a safety property---a property that describes a desirable behavior of the system. For example, "never dereferencing a null pointer" is a safety property, and is an invariant of the program if no reachable state ever dereferences a null pointer.

A few examples:

Our assertion x >= 5 is an invariant of the system: the property is the set $\{ x \in \mathbb{N} \; | \; x \geq 5 \}$, and every reachable state is contained in this set.
The property $even(x) \equiv x \; \textrm{mod} \; 2 = 0$ on the natural numbers is the set $\{ 0, 2, 4, 6, ... \} \subseteq \mathbb{N}$. But $even$ is not an invariant of our example transition system, as there are reachable states (e.g., 7) that do not satisfy it; more formally, $7 \in R$ but $7 \not\in I$.
The assertion x != 4 is an invariant. While there exists states where this property does not hold, none of these states are reachable by the system.

On the surface, this idea of invariance might seem a bit unnecessary. Why bother talking about invariants and subsets of the set of reachable states $R$, when we could just directly inspect the set $R$ itself, and just "check" every state in it? The problem is that $R$ is often very difficult to characterize precisely---in fact, Rice's theorem tells us that deciding whether a state $s$ is in $R$ for an arbitrary transition system is undecidable. To work around this problem, we can write down a simpler invariant that might include some unreachable states, but suffices to prove whatever property we care about. Such an invariant is likely stronger than the invariant we actually care about, and so this idea is very similar to that of strengthening the inductive hypothesis that we've seen in previous lectures.

Proving invariants by induction

How to we prove that a property is an invariant of a transition system? As is often the case in PL, we'll do it by induction. In particular, the trick is to induct over paths through the program. Let's try it out informally first, and then below we'll make it more formal and relate it to the notions of induction we developed in Lecture 1.

Theorem: x >= 5 is an invariant of the example system above.

Proof: By induction on $\rightarrow$. There are two cases:

Base case: in the initial state $s_0$, x = 5 >= 5, so the invariant holds.
Inductive case: Suppose the invariant holds in state $s$. Then we need to prove that it holds for every state $s'$ such that $s \rightarrow s'$. By the definition of $\rightarrow$, there is only one such state for this system: $s \rightarrow s + 1$. But we know by the inductive hypothesis that $s \geq 5$, and so it must be that $s + 1 \geq 5$ too. Therefore the invariant holds for all states that $s$ can step to.

What we did here was induct over paths through the program, or equivalently, induct over the set of reachable states. The base case just required us to show that the property holds in every initial state. The inductive hypothesis asked us to consider a state $s$ in which the property holds, and show that every state we can step to from $s$ maintains the property. Then, by induction, the invariant holds in every reachable state, as our proof has covered the initial state $s_0$ and every state $s'$ such that $s_0 \rightarrow^* s'$.

But there is a big catch to this approach! Let's try do the same proof again for the property x != 4, which is obviously also an invariant of our program.

Theorem: x != 4 is an invariant of the example system above.

Proof attempt: By induction on $\rightarrow$. There are two cases:

Base case: in the initial state $s_0$, x = 5 != 4, so the invariant holds.
Inductive case: Suppose the invariant holds in state $s$. Then we need to prove that it holds for every state $s'$ such that $s \rightarrow s'$. By the definition of $\rightarrow$, there is only one such state for this system: $s \rightarrow s + 1$. By the inductive hypothesis, we know that $s \neq 4$.

But we're stuck now—we can't prove that $s + 1 \neq 4$ with this inductive hypothesis. Indeed, $s = 3$ is a counterexample.

What gives? This invariant is obviously true, but we cannot prove it by induction because the inductive hypothesis is too weak. We call an invariant an inductive invariant if it can be proven by induction like this. But not all invariants are inductive! In this case, we know that $s = 3$ isn't reachable, and so isn't a real counterexample to the invariant. But our proof doesn't yet have enough information to be able to talk about reachability.

This is the same idea we've seen before about needing to strengthen the inductive hypothesis. Here we need to strengthen the invariant into one that is an inductive invariant, and then use that stronger invariant to prove the actual invariant we wanted.

Strengthening invariants is a skill; it's not mechanical, and while there has been some really exciting recent work to automatically determine inductive invariants of a program, it's a difficult problem in general. In this example, we're in luck, because we already proved an invariant x >= 5, and that's strong enough to prove that x != 4 is an invariant too, because $x \geq 5 \Rightarrow x \neq 4$. But in general, this is the hard part of doing PL proofs: coming up with an invariant that is (a) strong enough to prove the thing we actually wanted, and yet (b) is an inductive invariant.

Inductively defined propositions

We've been handwaving about induction a bit, so let's try to put this induction proof onto some more solid footing. In Lecture 1 we talked about inductive sets, and saw how we can use induction over inductive sets to prove a property for every member of the set.

The induction approach we were using above is just an instance of this same pattern. What we're effectively doing is inducting over the proposition $\rightarrow^* $. To see what that means, let's write down a definition of $\rightarrow^*$ as two inference rules, as we did for inductive sets in Lecture 1:

$$ \frac{}{s \rightarrow^* s} \textrm{ (base case)} $$

$$ \frac{s_1 \rightarrow s_2 \quad\quad s_2 \rightarrow^* s_3}{s_1 \rightarrow^* s_3} \textrm{ (inductive case)} $$

In words, what we're saying here is:

$s$ can always reach itself (the base case).
If $s_1$ can take one step to $s_2$ and then $s_2$ can reach $s_3$, then $s_1$ can reach $s_3$ as well.

What we've defined here is called an inductive proposition. A proposition is a logical statement; here, the statement is "$s_1$ can reach $s_2$". We're using an inductive definition to specify the two ways this proposition can be true: either the trivial case $s_1 = s_2$, or the inductive case where we can "add one more step" to an existing reachability proposition. You might notice that this looks pretty similar to the list ADT; the inductive case is "one step" plus "the rest of the steps", and that idea of a trace through the program is one way to think about $\rightarrow^*$.

Importantly, in our world, when we define an inductive proposition we are defining the only ways it can be true. We do not assume the law of the excluded middle—we do not assume that every proposition is either true or false. Instead, we declare a proposition true only when we have evidence (a proof) that it's true. This is "constructive logic" (or sometimes "intuitionistic logic"), a weaker form of logic than the classical logic you're used to. It's weaker because we'll be able to prove fewer things, but in return when we do have a proof it's constructive—it tells us how something is built.

To do induction over an inductive proposition, we first prove every base case proposition. Then in the inductive cases, we get to assume that the proposition holds for the smaller case, and need to prove that it holds for every way that we can add one more step.

If it helps, you might think of "$n$ is a natural number" as being an inductively defined proposition. When we were doing induction over the natural numbers, what we were really doing was inducting over the ways this proposition could be true (of which there were two: a base case and an inductive case). This idea is sometimes called "inducting over evidence", because what we're inducting over is the ways the proposition could be true, which allows us to reach all the ways the proposition could be true (e.g., all the ways to step a program any number of times) by looking at only finitely many cases.

An induction principle for invariants

Now we can make our induction principle more solid by appealing to this definition. Our induction proofs were proving invariance in two steps:

The invariant holds in all the initial states: $\mathsf{InitiallyHolds}(I) \equiv \forall s_0 \in S_0.\ I(s_0)$.
The transition relation preserves the invariant: $\mathsf{HoldsAfterStep}(I) \equiv \forall s, s' \in S.\ (I(s) \land s \rightarrow s') \Rightarrow I(s')$.

We can prove that this is a correct induction principle using $\rightarrow^*$. We'll start with a lemma that says an invariant holding after one step continues to hold over any number of steps:

Lemma: Let $P$ be a property over $S$. If $\mathsf{HoldsAfterStep}(P)$, then for every $s_1, s_2$ in $S$ such that $s_1 \rightarrow^* s_2$, $P(s_1) \Rightarrow P(s_2)$.

Proof: Suppose that $\mathsf{HoldsAfterStep}(P)$, and let $s_1$ and $s_2$ be states such that $s_1 \rightarrow^* s_2$. Then we proceed by induction over $\rightarrow^*$. There are two constructors for $\rightarrow^*$:

Base case: if $s_1 = s_2$, then clearly $P(s_1) \Rightarrow P(s_2)$.
Inductive case: suppose that $s_1, s_2, s_3$ are states such that $s_1 \rightarrow s_2$, $s_2 \rightarrow^* s_3$, and $P(s_2) \Rightarrow P(s_3)$. We need to prove that $P(s_1) \Rightarrow P(s_3)$, so assume $P(s_1)$ is true. Then by $\mathsf{HoldsAfterStep}(P)$ we know that $P(s_2)$ is true. And then by the inductive hypothesis, if $P(s_2)$ is true, then $P(s_3)$ is true. Therefore, $P(s_1) \Rightarrow P(s_2)$.

With this lemma, our actual induction principle falls out fairly easily:

Theorem: Let $P$ be a property over $S$. If $\mathsf{InitiallyHolds}(P)$ and $\mathsf{HoldsAfterStep}(P)$, then $P(s)$ is true for every reachable state $s$.

Proof: A state $s$ is reachable iff there exists a state $s_0 \in S_0$ such that $s_0 \rightarrow^* s$. By the lemma, if $s_0 \rightarrow^* s$ then $P(s_0) \Rightarrow P(s_s)$. But by $\mathsf{InitiallyHolds}(P)$ we know that $P(s_0)$ is true. Therefore $P(s)$ is true.

This formulation of induction over transition systems is very powerful. In particular, unlike when we were inducting over program syntax with denotational semantics, induction over transition systems lets us prove properties of non-terminating programs. It also makes it much simpler to prove properties even about terminating programs that involve loops or other repetitive structures that are difficult to define denotationally. Also, inductive propositions are a great fit for proving in Coq, which "only knows inductive types"—we'll be able to encode inductive propositions as inductive types just like we've done before with inductive sets.

The other thing to take away from this example was that there's many different ways to do induction. We call this theorem an induction principle even though it's not directly inducting over an inductive set; instead, it's implicitly exploiting the inductive structure of the set. Our version is sufficient to prove an invariant, but it may not be necessary — indeed, we saw there are invariants that are not inductive and therefore can't be proven this way.

A more interesting example system

Here's another program that is a bit more complicated:

n = input()
x = 0
y = n
while y > 0:
    x = x + 1
    y = y - 1
assert x == n

where here n is some arbitrary input provided by the user.

Let's first translate this program into a transition system. Again, we'll take each iteration of the loop to be a single step of the program. This time, let's broaden our horizons a little and model integers rather than just natural numbers. Our system looks like this:

The set of states is $S = \mathbb{Z} \times \mathbb{Z}$, where we'll use the first element of the pair for x and the second for y.
The initial states are $S_0 = \{ (0, n) \; | \; n \in \mathbb{Z} \}$. Since n is provided by the user, there are many possible initial states.
The transition relation $\rightarrow$ is the set $\{ ((x, y), (x + 1, y - 1)) \; | \; x, y \in \mathbb{Z} \}$.

Unlike our earlier example, this time we'd like to prove something about the final result of the program rather than its intermediate states, so our assertion is after the loop. We need to be careful: x == n is certainly not an invariant of this program! But we can use invariants to prove it. The idea will be to prove a different property that is an invariant, and choose that property such that at the end of the program, it implies our assertion.

One way to do this is to state a weaker property $y = 0 \Rightarrow x = n$, capturing the idea that once y reaches 0, which happens at the end of the loop, we want the assertion to be true. If we can prove this property is an invariant of the system, we will know that our assertion holds. We can try to prove this is an inductive invariant using our new induction principle, but skipping ahead to the $\mathsf{HoldsAfterStep}$ case, we will find a counterexample: if $n = 2$, the state $(0,1)$ is in $S$ and satisfies the property (vacuously, because $y = 1 \neq 0$), and $(0,1) \rightarrow (1,0)$, but $(1, 0)$ does not satisfy the property.

As we saw before, this doesn't mean the property is not an invariant, just that it's not an inductive invariant. We will need to proceed by strengthening our invariant. We're looking for an invariant that does (a) is inductive and (b) can be used to prove the original invariant. Again, strengthening invariants is a skill that needs practice and experimentation; there's not really a mechanical way to look at this problem and crank out the invariant we need.

I happen to know that the invariant we're looking for is n = x + y. This is certainly enough to prove our original invariant: if $n = x + y$ and $y = 0$, then it must be that $x = n$. But can we prove it's an inductive invariant? Yes!

Theorem: n = x + y is an invariant of the system above.

Proof: By our induction principle for invariants. We need to prove two things:

$\mathsf{InitiallyHolds}$: let $s_0 \in S_0$ be an initial state. Then $s_0 = (0, n)$, so $x = 0$ and $y = n$, and therefore $n = x + y$.
$\mathsf{HoldsAfterStep}$: let $s = (x, y)$ and $s' = (x', y')$ be states such that $s \rightarrow s'$ and $x + y = n$. We need to prove that $x' + y' = n$. By the definition of $\rightarrow$, if $s \rightarrow s'$ then $x' = x + 1$ and $y' = y - 1$. Then $x' + y' = (x + 1) + (y - 1) = x + y$, and the inductive hypothesis says that $x + y = n$, therefore $x' + y' = n$.

Now we're done: we know that $n = x + y$ is an invariant, and that implies that $y = 0 \Rightarrow x = n$ is an invariant, and that in turn implies that when we exit the loop in the program, the assertion holds!

When manual transition systems get difficult

So far we've been defining our transition systems by hand for each program. It hasn't been all that hard—we just declare all the variables, taken together, to be the state. But that approach isn't always going to work. Here's an example:

x = 0
while x < 5:
    x = x + 1
while x > 0:
    x = x - 1

We could try to define a transition system for this program where the set of states is just $\mathbb{N}$, reflecting the possible values of x. But we'd be in trouble when trying to define the transition relation: the first loop suggests a relation of $x \rightarrow x + 1$, while the second suggests $x \rightarrow x - 1$. This system allows executions that we know aren't possible, like this one:

x = 0, x = 1, x = 2, x = 1

The problem here is that we need to know "which loop" we're in to know which transitions are possible. One way to fix that would be to add a boolean to the state that tracks which loop we're in. But in general, that's a very tedious and manual solution. In the next lecture, we'll see a more general approach that deals with this issue automatically.