[ This section is a little out of place--need to introduce type and equality predicates first! Those have been presented in class, so this should be comprehensible anyway. Need to make this a separate hunk, and move it after the next hunk. ] In this section I'll demonstrate the most common idioms for recursion over simple data structures--lists and trees.
Some of the examples will be implementations of standard
Scheme procedures like length
, list
, append
,
and reverse
. Scheme already has these procedures built
in, but you should understand how they can be implemented using
simpler procedures like cdr
and cons
. You'll
inevitably have to write special-purpose procedures that are
slightly different, but coded similarly. (In later chapters,
I'll show some more advanced programming techniques that let
you implement more general and/or efficient procedures like
these.)
I'll also show a few other handy procedures for operating on lists, e.g., a list-copying routine.
Then I'll show recursion over simple binary trees of pairs. The normal style for recursion over trees in Scheme is slightly different from what you may be used to in languages like C or Pascal--and simpler.
length
length
is the standard Scheme procedure that returns the length
of a list. It only counts the elements along the spine of the list
(down the cdr
's).
It's easy to do this using recursion. The length
of a list is 0 if the list is empty, and otherwise it's 1 plus
the length of the rest of the list. Here's the easiest way to
define length
:
(define (length lis) (cond ((null? lis) 0) (else (+ 1 (length (cdr lis))))))
The main thing to notice about this example is the recursive structure.
The procedure can accept a pointer to either a pair or
the
empty list. The structure of the procedure corresponds directly
to the recursive definition of a (proper) list. The two-part cond
corresponds to the fact that there are two rules that characterize
lists; it figures out which case we're dealing with.
We explicitly checked for the end-of-list case, but we implicitly
assumed that otherwise the object being operated on is a
pair. This might seem like bad style, but actually it's good,
because it ensures that an error will be signaled if the argument
to length is not the empty list or a pair--the second branch
of the cond
will be taken (erroneously), but the attempt to
evaluate (cdr lis)
will signal an error.
We could make this clearer by using a three-branch cond
, with
separate branches for the two valid cases and the error case:
(define (length lis) (cond ((null? lis) 0) ((pair? lis) (+ 1 (length (cdr lis)))) (else (error "invalid argument to length")))
Here I've used the error-signaling procedure error
, which
stops execution and signals an error. (In most systems, the error
message "invalid argument to length"
will be printed and
the user will be presented with a break prompt for debugging
the problem.) Unfortunately, error
is not supported by
all Scheme systems. (Later I'll show an implementation that
should work reasonably well in any Scheme system.)
Also note that in this example, I've used lis
as the name of a list
argument, rather than list
. That's because there's a standard
Scheme procedure named list
, which will be shadowed by any
local variable with the same name. (This is because of Scheme's
unified namespace---you can't have a variable and a procedure
with the same name, for reasons that will be explained later; list
seems to be the only identifier for which this is commonly a problem.)
The above definition of length
is not tail recursive--after calling
itself, there must be a return so that 1
can be added to the value
and returned. Later I'll show a more efficient, tail-recursive version of
length
, and a more general procedure called reduce
that
can be used to construct a variety of procedures whose basic
algorithm is similar.
There are two common senses of copying, shallow copying, and deep copying. A shallow copy makes a copy of one object, and the copy has pointers to the same objects that the original did.
A deep copy copies not only the top-level objects in a data structure, but the ones below that, and so on recursively, so that a whole new data structure is created.
For lists, which are made up of more than one object, it is often
useful to copy the spine of the list, i.e., doing a deep copy along
the cdr
's only. We typically think of a list as being
like a special kind of object, even though it's really a sequence
of pair objects. It's therefore natural to copy "just the list."
If we just want to do a shallow copy, we can define pair-copy
to copy a pair, without copying anything else.
In these examples, I'll assume we only want to copy list structure--that is a connected set of pairs. Whenever we come to something that's not a pair, we stop copying and the copy shares structure with the original. (These aren't standard Scheme procedures.)
Here's a truly shallow copy, just copying a single pair:
(define (pair-copy pr) (cons (car pr) (cdr pr)))
If we want to do a deep copy, we can use recursion to copy car
or cdr
values that are also pairs. The following code for
pair-tree-deep-copy
assumes that the structure to be copied is a
tree of pairs. (If there is any shared structure, it will be copied
each time it is reached, and the copy will not have the same structure.
It will always be a tree. Preserving shared structure while copying
is harder, but can be done. If there's a directed cycle,
pair-tree-deep-copy
will loop infinitely.)
(define (pair-tree-deep-copy thing) (if (not (pair? thing)) thing (cons (pair-tree-deep-copy (car thing)) (pair-tree-deep-copy (cdr thing)))))
Notice that pair-tree-deep-copy
works on improper as well as
proper lists, but only copies the pairs. Where it gets to a non-pair
value, it stops and just uses the same value in the copy, and the copy
shares structure with the original.
The code for pair-tree-deep-copy
directly reflects the kind of
structure it copies. It can handle non-pairs, which are assumed to be
leaves of the graph of pairs that it's copying, and it can handle pairs,
which are assumed to be interior nodes of the tree. Their car
and
cdr
values may be leaves of the tree, or other pairs.
So the recursive definition of a pair-tree is:
The first rule is the base case, i.e., is the simple one that doesn't require recursion. The second is the recursive rule, which expresses the fact that an interior node's car and cdr fields can point to any kind of pair-tree: a leaf, or another interior node whose children may likewise be leaves or other interior nodes...
This is the easy way to write recursive routines over data structures--figure out a recursive description that exactly describes the expected data structures, and then use that recursive description to write a recursive description of the result you want. Then you can straightforwardly code routine that will traverse the structure and copmute that result.
Generally, we write the base case first, to make it clear where recursion ends--and so that we don't forget to write it and accidentally write infinite recursions or unhandled cases. If you do this consistently, your code will be more readable and you'll make fewer mistakes.
To copy the spine of a proper list, we can use this description of the answer we want:
A copy of a list is
car
value is the same as the car
of the original
list, and whose cdr
value is a copy of the
rest of the original list.
Here's the code:
(define (list-copy lis) (cond ((null? lis) '()) (else (cons (car lis) (list-copy (cdr lis))))
As usual, we only check to see if we're a the end of
the list, and otherwise assume the argument is a pair. Since we
take the car
and the cdr
of the pair in the latter case,
we'll get an error if the argument is not a proper list. This is usually
what we want, so that Scheme will signal an error when it gets to
the part of the list with unexpected structure.
The name list-copy
was chosen to suggest that it operates on lists,
and in Scheme terminology "list" means "proper list" by default. If
we want a routine that copies improper lists, we should call it something
else, and write a comment saying what kinds of things it works for.
Actually, lists are so common in Scheme that we could have just called
it copy
. Most procedure names begin with the name of the kind
of structure they operate on, but exceptions are made for lists and
for numbers.
append
and reverse
Two handy operations on lists are append
and reverse
;
both are standard Scheme procedures.
append
takes any number of lists as arguments, and returns a
list with all of their elements. reverse
takes a list and returns
a new list, with the same elements but in the opposite order.
Note that like most Scheme procedures, neither of these procedures is destructive--each creates a new list without side-effecting (modifying) its argument(s).
append
Append works much like list-copy
except that we have multiple
lists to deal with.
The trick to getting it right is to maintain the essential structure
of list-copy
, with the right minor differences.
For now, let's keep things simple, and just do a two-argument version
of append
, called append2
.
Our strategy is to recurse through the first list, like list-copy
,
copying one element of the list at each step. When we get to the
end, however, the base case is different--rather than terminating
the list with the empty list, we just use the second list as the
"rest" of the copy we're making.
Notice that the base case occurs when the first list is null--the
append
of an empty list and another list is just that other
list--conceptually, we cons
zero items onto the front of
that list. Concretely, we can just return that list.
Here's the recursive characterization of the result we want
car
is the car
of the first list,
and whose cdr
is the append
of the rest of
the first list and (all of) the second list.
Here's a simple two-argument version of append
:
(define (append2 lis1 lis2) (cond ((null? lis1) lis2) (else (cons (car lis1) (append2 (cdr lis1) lis2)))))
Note that append2
copies its first list argument, but the
result simply shares a pointer to the last list argument--the
last list is not copied, so the result shares structure with
that list. This is also true of the standard Scheme function
append
, which can take any number of lists as arguments.
The first n-1 lists are copied, but the last is shared.
Be sure you understand the concrete operation of the above algorithm. On the way down during recursion, we're taking the first list apart, holding onto one list element at each step. When we hit the end of the first list, recursion stops and we return the second list. On the way back up, we're consing those items onto the new list we're creating, back-to-front.
Suppose we have defined two lists, foo
and bar
, like
this:
(define foo '(x y z)) (define bar '(a b)) (define baz (append bar foo))
The result will be that baz
shares structure with foo
,
but not with bar
. Changes to the list via foo
will
also be visible via baz
.
+----------------------------------------+ | | \|/ | +---+ +---+---+ +---+---+ +---+---+ | foo | *-+--->| * | *-+---->| * | *-+----->| * | * | | +---+ +-+-+---+ +-+-+---+ +-+-+---+ | | | | | \|/ \|/ \|/ | x y z | | | +---+ +---+---+ +---+---+ | bar | *-+--->| * | *-+---->| * | * | | +---+ +-+-+---+ +-+-+---+ | | | | \|/ \|/ | a b | /|\ /|\ | | | | +---+ +---+---+ +---+---+ | baz | *-+--->| * | *-+---->| * | *-+--------------------+ +---+ +---+---+ +---+---+
In general, the result of append
shares structure with the
last argument passed to append
. If you want to avoid
this, you can pass append
an empty list as its last
argument. For example (append '(1 2 3) '())
will copy
the list (1 2 3)
.
If you're worried about efficiency, be aware that append
takes time proportional to the length of the lists that must be
copied, i.e., all but the last list being append
ed. This
usually doesn't matter, but it's a consideration for performance-critical
parts of your program, especially if you're appending long lists.
(It's common to append
short lists onto the front
of long lists, and then reverse
the result if necessary.)
reverse
reverse
returns a reversed copy of a list.
There's an easy (but slow) way to define reverse
in terms
of append
. We just take the first element off the list,
reverse the rest of the list, and append the first element to the
end of the list. We do this recursively, so that each time we
reverse the rest of the list, we're doing the same thing on
a shorter list. When we get down to the end of the list, reversing
it is a no-op: the reverse of an empty list is the empty list.
(define (reverse lis) (if (null? lis) '() (append (reverse (cdr lis)) (list (car lis)))))
Think about how this actually works. reverse
recurses down the
list, calling itself on the cdr
of the list at each recursive step,
until the recursion stops at the end of the list. (This last call
returns the empty list, which is the reverse of the empty list.)
At each step, we use car
to peel off one element of the
list, and hold onto it until the recursive call returns.
The reversed lists are handed back up through the returns,
with the cars being slapped on the rear of the list at each return
step. (To add a single item to the end of the list using append
,
we must first put it in a one-element list using list
.)
We end up constructing the new list back-to-front on the way up from the recursion. Going down recursively tears the list apart, one item at each recursive step, and coming back up adds an element to the end of the new list at each step.
This is a good example to understand, both abstractly and concretely. You should understand the concrete steps involved in taking a list apart and putting it back together backwards. On the other hand, you should also recognize that the algorithm works even if you don't pay attention to that.
Once you get the hang of recursion, it's often easy to write algorithms
without actually thinking about the little steps involved, or thinking
much about the ordering of steps. In this case, it's easy to see that
if we can reverse the rest of the list, and append the first item to
the end of that, we've reversed the whole list. We don't need to think
much about the ordering of the operations, because that falls naturally
out of the way we pass arguments to functions. We can declare that
"the reverse
of a non-empty list is the append
of
the reverse
of the rest of the list and (a list containing) the
first item in the list", and then write the code accordingly, as a
pure function---one that only depends on the values of its
arguments, and has no side effects.
By writing this recursively, we'll apply the same trick
all the way down the list. Thinking a little
more concretely--but not much--we can see that at each time we
reverse the rest of the list, the list in question will be shorter.
Somewhere we'll hit the end of the list, so we have to handle that
base case. It's usually easy to see what the right thing to do
is for the base case. In this case, we can declare that "the
reverse
of the empty list is the empty list," and add
the appropriate branch to append
.
This is a good example of how you can combine functions to create new functions, implementing algorithms without using sequencing or side effects. (Notice that if we had side effects, we'd have to think very carefully about the ordering of steps, to make sure that we used a data structure after certain changes, and before others. Bleah.)
(The following remarks about efficiency are fairly advanced--you
shouldn't worry about these things yet if they get in the way of
learning how to write programs simply and straightforwardly. You
can skip or skim them and come back to them later once you've gotten
the hang of Scheme, and want to tune the time-critical parts of
your programs for maximum efficiency. On the other hand, you
may find that thinking about the concrete details reinforces
the basic ideas.)
There are two problems coding reverse
this very simple way,
however---reverse
turns out to be one of the hardest
"simple" list routines to code efficiently. Later I'll sho
better versions that are more clever, but only very slightly more
complicated. (They'll still be recursive, and won't use loops or
assignment.)
[ where? (Later I need to show a linear-time version that uses list->vector and then reverses the vector into a list tail-recursively... ]
The first problem is that each call to append
takes time
proportional to the length of the list it's given. (Remember
that append
effectively copies all of the pairs in the first list
it's given, making a backward copy.) We have to copy the "rest" of
the list using append
, starting at each pair in the list. On
average, we copy half the list at a given recursive step, so since we
do this for every pair in the list, we have an order n-squared
algorithm.
Another problem is that we're doing things on the way back up from recursion, which turns out to be more expensive than doing things on the way down. As I'll explain in a later chapter, Scheme can do recursion very efficiently if everything is done in a forward direction, on the way down--Scheme can optimize away all but one of the returns, and the state-saving before the calls. (Luckily, this is easy to do.)
Since Scheme provides a built-in reverse
, you don't have
to think much about this. A good Scheme system will provide
a heavily-optimized implementation of reverse
that is
linear in the length of the list being reversed.
reverse
is very handy, and the efficiency of a built-in
reverse
is important, because it's usually best to
construct a list in whichever order is easy and efficient,
and then reverse the whole list if necessary. Typically, you
cons
one item onto a list at a time, or maybe append
a few items at a time, in whatever order it's easiest to create the
list. This allows you to construct the list in linear time;
with a linear-time reverse
, the overall process is still
linear-time.
==================================================================
This is the end of Hunk E.
TIME TO TRY IT OUT
At this point, you should go read Hunk F of the next chapter and work through the examples using a running Scheme system. Then return here and resume this chapter.
================================================================== (Go to Hunk F, which starts at section Lists (Hunk F).)