Inductive Proofs of Other Recursively Defined Structures

Subsection 8.10.7 Inductive Proofs of Other Recursively Defined Structures

Induction is a common way to prove claims about recursively defined sequences. But it’s also useful in proving claims about other kinds of recursively defined structures. We’ll present two examples here. They’re interesting. But if you choose to skip them, you can still go on to the next page

Recall our definition of a Boolean well-formed formula (or wff). It was spread out over several pages, but let’s summarize it concisely here:

T and F are wffs.

Individual variables, like p or R are wffs.

If w is a wff, so is w.

If x and y are wffs, so are x  y, x  y, x  y, and x  y.

If w is a wff, so is (w).

Nothing else is a wff.

Notice that we have exploited a recursive definition of a wff. It begins with the smallest units. Then it describes all the ways in which those smaller units can be combined to form larger ones.

This recursive definition lets us generate wffs such as:

((R  S)  ((T  R)  (Y  S)))  X

In any wff that we can build, the parentheses will be balanced (i.e, the number of (‘s equals the number of )’s and they will be properly nested. (You can see an example of proper nesting in the wff above. We’ve nested smaller pairs inside larger ones.)

We can use induction to prove that all wffs will have properly nested parentheses.

Define the length of a wff w to be the number of symbols it contains (blanks don’t count). We’ll write the length of w as:

|w|

We’ll count symbols as follows:

T and F are each symbols.

Each individual variable is a symbol (even if we give it a multi-character name like RAIN).

, , , , , (, and ) are each symbols.

So, for example, | ((R  S)  ((T  R)  (Y  S)))  X | = 25.

We’ll prove our claim by strong induction on |w|.

Define: P(n)  “Every wff of length n has balanced parentheses.”

Base cases: P(1) is true, trivially; the shortest wff that contains parentheses has length 3.

P(2) is true. The only wffs of length two have the form T, F, or p, for some variable p. Thus no parentheses.

Induction step: We must prove that, for n  2, if all wffs of length n or less have balanced parentheses, then so do all wffs of length n + 1. So we must prove:

(For all 1  i  n, P(n))  P(n + 1)

We can use Case Enumeration to help us here.

Consider any wff w of length n + 1, where n  2:

Suppose w has the form (s), for some wff s. Then |s| = |w| - 2. By the induction hypothesis, s has balanced parentheses. So w does too, since a matched pair has been added around an expression that already has balanced parentheses.

Suppose w t has the form s for some wff x. Then |s| = |w| - 1. By the induction hypothesis, s has balanced parentheses. So w does too, since none have been added or removed.

Suppose w has the form x  y, x  y, x  y, or x  y. Then both |x| and |y| must be less than |w|. By the induction hypothesis, both x and y have balanced parentheses. So w does too, since none have been added or removed.

We’ll define a language to be a collection of strings. There are various ways to describe particular languages that we might wish to reason about. We’ll present one very useful such technique here.

A grammar is a special initial string, plus a set of rules for transforming one string into another. The job of a grammar is to define a language, so we’ll say that a grammar G generates some language L. Define a useful notation:

L(G) is the language that grammar G generates.

A grammar will exploit two kinds of symbols:

Working symbols: These will be used by a grammar, G, but must be removed before we can claim that G has generated some string in L(G).

Terminal symbols: These are the symbols that can occur in strings in L(G).

And we need one more notation. We need a way to write the empty string, i.e., the string that contains no characters. It’s hard to print nothing. So we will use the special symbol:

 (read “epsilon”) stands for the empty string

Each rule in G has two parts, a left-hand side, which you can think of as a pattern, and a right-hand side, which is a string that will be substituted for the matched pattern whenever the rule is applied. So, for example, consider this simple rule:

S  a T a

The left-hand side of this rule is S. If this rule is applied, then some S that matched will be chopped out and replaced with the rule’s right-hand side (a T a).

G’s job is to start with its initial string and apply rules to derive strings in L(G). At each step, it may choose to apply any rule whose left hand side matches the string it has derived so far. To actually apply the rule, it chops out the string that matched the rule’s left hand side and replaces it with the rule’s right hand side. We call a sequence of steps such as this a derivation.

Consider the grammar G that has starting string S and the following rules:

[1] S  a S a

[2] S  b S b

[3] S  

S is a working symbol and both a and b are terminal symbols.

Let’s watch G in action. We’ll use the symbol  to indicate one step in a derivation (the process of applying G’s rules). In this example, we’ll apply rule [2] twice, then rule [1], then rule [3]. At each step, we’ve underlined the string that matches and then gets removed. Ignore blanks. They’re just here to make the example easier to read.

S  b S b  bb S bb  bba S abb  bbaabb

So we can say that bbaabb is in the language L(G).

We have just given a definition of the language L(G). It’s all and only the strings that G can generate. Notice that this definition is recursive. We define the simplest strings that G can generate and we have a way to build larger ones from those.

Given this recursive definition, we might expect that we could prove things about L(G), using induction. And we can.

We’ll leave as an exercise the proof that, given our example grammar G, every string in L(G) has even length.

Exercises Exercises

1.

Let’s return to the grammar G with starting string S and these rules:

[1] S → a S a

[2] S → b S b

[3] S → ε

Prove by induction on the length of the generated strings that every string in L(G) has even length.

Answer.

Define: P(n) = All strings of length n or less that are in L(G) have even length. Base cases: We should start with two base cases, 0 and 1. Write out your proof that all strings of length 0 or 1 that are in L(G) have even length. When you’re done, click here to check it. • n = 0: Since 0 is even, it’s trivially true that all strings of length 0 or less have even length. • n = 1: There are no strings of length 1 in L(G). To generate anything other than the empty string, we must apply either rule [1] or rule [2], and each of them adds two characters. Induction step: First, observe that that every rule in G that adds characters adds exactly two (an even number) characters. Now complete your proof. Assume the induction hypothesis, namely P(n). Show that P(n+1) must therefore also be true. If you’d like a hint, click here. When clicked should see: If we want to derive a string of length n+1, should we think about starting to derive a string of length n but then adding one more character? Or should we think about starting to derive a string of some other length and then adding some other number of characters? If you’d like another hint, click here. When clicked should see: Divide your proof into two cases. When you’re done, click here to see our proof. When clicked should see: Base cases: • n = 0: Since 0 is even, it’s trivially true that all strings of length 0 or less have even length. • n = 1: There are no strings of length 1 in L(G). To generate anything other than the empty string, we must apply either rule [1] or rule [2], and each of them adds two characters. Induction step: Observe that that every rule in G that adds characters adds exactly two (an even number) characters. Assume P(n). To generate a string of length n+1, we must start with a derivation that could generate a string of length n-1. Then, instead of applying rule [3] and stopping, we must apply either rule [1] or rule [2]. Then we must apply rule [3] and stop. Now consider two cases: • n-1 is odd: By the induction hypothesis, there are no strings of length n-1 in L(G). So there are no derivations that could get us to the point where we could add two characters then stop, and have generated a string of length n+1. So G cannot generate any strings of length n+1. • n-1 is even: We add two characters to some string of that length. Thus we still have a string of even length.