[ The example compiler in compiler.scm is the skeleton of a simple compiler for a subset of Scheme, whose structure corresponds fairly closely to the example interpreter in eval.scm. ]
[ this is out of place, or needs more introductory intro first: ]
Where the interpreter has a basic dispatch routine called eval
, which
can evaluate any kind of expression, the compiler has a basic dispatch
routine called compile
, which can compile any kind of expression.
Like eval
, compile
examines the expression to be compiled,
and dispatches to an appropriate routine for that kind of expression.
The routine that compiles an expression may recursively call compile to
compile subexpressions, just as the interpretive evaluator may call
eval
recursively to evaluate subexpressions.
[ This is somewhat redundant with earlier stuff, but more concrete. Should I cut it down?]
Before answering what a compiler is, let's backtrack and talk about interpreters.
An interpreter really does two things:
Typically, most of the work done by an interpreter is in the first category--our example interpreter, for example, spends a lot of time examining expressions to see if they're self-evaluating or symbols or lists, and dispatching to the appropriate procedure to evaluate that kind of expression. This dispatching is interleaved with the actual work that evaluates the expressions.
One of the problems with an interpreter is that every time an
expression is encountered, it is analyzed again. Consider an
expression like (+ foo bar)
embedded in a loop that iterates
many times. Our interpreter will encounter this expression at each
iteration of the loop, and at each iteration of the loop it will
do mostly the same things: it will examine the expression and find
out it's a list, then call eval-list
, which will further examine
it to find out it's a combination (not a special form or macro),
and call eval-combo
. Then eval-combo
will call
eval
recursively
to evaluate the subexpressions, and each call to eval
will examine
the subexpressions and dispatch to the appropriate specialized
evaluation routine. Only then do we start actually computing
the value of the expression, by computing the values of the
subexpressions +
, foo
, and bar
, i.e., looking
up the values of those variables.
We would rather factor out most of this redundant work, and examine
the expression just once to see what to do. Then each time we
"evaluate" the expressions, we can just do those things. For
the expression (+ foo bar)
, the set of actions an interpreter
will execute (leaving out all of the analysis and dispatching
is):
look up variable bar look up variable foo look up variable + apply
(Here we've assumed that we evaluate subexpressions of a combination from right-to-left, rather than the more intuitive left-to-right order; that's a legal way to do it in Scheme an it turns out to be handy in a very simple compiler, as we'll explain in a minute.)
[ maybe I should change this to do args left-to-right, but the operator last, like RScheme. ]
For code like this, which doesn't have any conditionals in it,
we can convert an interpreter into a compiler very easily. We
just modify the interpreter so that instead of actually evaluating
the expressions, it just records what operations it would execute
if it were interpreting the expression. I'm intentionally being
vague as to how exactly these simple operations (like
look-up-
variable
) work, but you should be able to see that
each of them should be translatable into a handful of machine
instructions. That's how most compilers work: they first translate
a program into an intermediate code representation, like our
look-up-variable operations, and then translate that representation
into machine instructions. (In between there may be one or more steps
that "optimize" the intermediate code, and each step may
represent the code in a different way.)
So this simple compiler just "pretends" to evaluate the expression, but whenever it gets to an actual action (like looking up a variable, or calling a procedure), it simply records what action it would take if it were just an interpreter. The result is a list of actions which, if taken, will have the same effect as interpreting the expression.
Here's another example:
(let ((x 22) (y 15)) (+ x (* x y)))
Supposing that our intermediate code representation is a sequence of lists that represent operations and their operands, the code that our simple compiler will generate is:
(fetch-literal 22) (fetch-literal 15) (bind x y) (look-up-variable y) (look-up-variable x) (look-up-variable *) (call-procedure) (look-up-variable x) (call-procedure) (unbind)
Later on, we'll talk in more concrete detail about where values are temporarily stored when they're looked up, and various tweaks to make it possible to translate intermediate code into smaller and faster sequences of machine instructions. For now just notice that we can string together sequences of these intermediate code operations, and if we just translate each of them into some machine instructions, we can string those sequences of machine instructions together and get a larger sequence that we can execute to evaluate the whole expression. We can execute it as many times as we want, and all of the expression analysis and dispatching will already have been done--the only work done each time it's executed is the work that actually binds variables, looks up values, calls procedures, and so on.
It's not much harder to compile conditional expressions like if
.
When we compile an if
, we need to generate code for the condition
expression, the consequent expression, and the alternative expression.
(The "consequent" is the code executed if the condition is true,
and the "alternative" is the code executed if it's false.) Then we
need to string the code for those expressions together appropriately
with some conditional branches:
<code for condition> (branch-if-false "else-action-label") <code for consequent> (branch-unconditionally "end-label") "else-action-label" <code for alternative> "end-label"
The labels here will actually be translated into the addresses of
the code they label, and the branches will be filled in with those
addresses. (We have to be careful to use a unique pair of new
labels for each if
we compile, so or some other trick like that,
so that we can nest if
expressions and keep their labels straight.)
(One way of generating machine code from this representation is to translate each of the statements into a short sequence of assembly instructions and each label into an assembly label, stringing them together as shown. Then the assembly code can be assembled into machine code.)
Note that for an if
, the control structure of the compiler is
actually simpler than the control structure of an interpreter. The
interpreter will evaluate the condition expression, and then decide
at run time whether to evaluate the consequent ("then") expression
or the alternative ("else") expression. The compiler will always
compile all three subexpressions, and string them together with
conditional branches that will do the deciding at run time, based
on the runtime value computed by the code for the condition expression.
Here's a slightly more complicated example:
(let ((x 15)) (if x (let ((y (* 2 x))) (+ x y)) #t))
translates to intermediate code roughly like:
(fetch-literal 15) (bind x) (lookup-variable x) (branch-if-false "else22") (lookup-variable x) (fetch-literal 2) (lookup-variable *) (call-procedure) (bind y) ; create and enter envt that binds y (lookup-variable y) (lookup-variable x) (lookup-variable +) (call-procedure) (unbind) ; exit envt that binds y (branch "end22") "else22" (fetch-literal #t) "end22"
There are actually a couple of minor things wrong with the code we've generated, but this is pretty close to a workable intermediate representation.
[ Talk about machine code, interpretive virtual machines, etc. ]
The main function of the compiler is compile
, which generates
intermediate code for an expression, and which may call itself
recursively to generate code for subexpressions.
Calls to compile
hand it an expression and some bookkeeping
information we'll discuss later. Compile returns intermediate
code, plus updated versions of some of the bookkeeping information.
To start this process off, top-level forms (like the ones you type
into the read-
compile-
run-
print
loop, or
definitions of top-level procedures) are massaged a little, then
intermediate code for them is generated. Then the intermediate
code is converted into real executable code and packaged up as a
closure that can be called.
We will discuss these issues of massaging top-level forms and generating executable closures later; for now, the main thing to understand is the recursive generation of intermediate code for nested expressions.
Here's the main dispatch routine of the compiler, which is analagous
to the interpreter's eval
:
(define (compile expr c-t-envt literal-state c-t-cont) (cond ((symbol? expr) (compile-symbol expr c-t-envt literal-state c-t-cont)) ((pair? expr) (compile-list expr c-t-envt literal-state c-t-cont)) ((self-evaluating? expr) (compile-self-eval expr c-t-envt literal-state c-t-cont)) (#t (syntax-error "Illegal expression form" expr))))
For now, ignore most of the arguments to compile
, we'll explain
them later. The main thing to notice is that it looks a lot like
eval
.
[ blah blah blah...]
[ Somewhere, it's important to bring out the difference between the mutual recursion of eval and apply in the interpreter and the way the compiler works. Eval recurs locally, but just generates code for apply... The control structure of the compiler is actually simpler than for the interpreter, because the hairy stuff just happens at run time... ]
Before trying to understand the compiler itself in more detail, it is probably best to have a concrete idea of what the representations of data are, how procedure calls work, and how registers are used. That is, you have to understand the "abstract machine" that the compiler compiles for.
An abstract machine is an abstraction of low-level operations and objects. The compiler first compiles code from the source language into this lower-level representation, and then translates the "abstract machine language" into actual executable code. (The executable code may be machine code that runs directly on the hardware, or an interpretive executable code such as bytecodes, which are interpreted by a fast interpreter.)
You can think of an abstract machine as being more like an assembler than an interpreter, but maybe a little smarter than most assemblers.
I will describe one particular set of features to make things concrete; this is not quite how RScheme works, or Scheme-48, or any particular other system that I know of, but there's nothing unusual about it except maybe its simplicity.
In fleshing out our example compiler, let's suppose our system works this way:
The registers of the abstract machine may represent hardware registers, or just storage locations that are used in these stereotyped ways. (For example, if compiling to C, the registers might be C global variables, and the C compiler might or might not let you specify that the variables should be put in hardware registers.)
We'll assume that there are several important registers that can be used by the code our compiler generates:
VALUE
register, where an expression leaves a value
so that it can be used by an enclosing expression. In
The case of a procedure, this is where the value is left
for the caller when the procedure returns. The value
register is also used when calling a procedure.
ENVT
register, which holds a pointer to the
environment that code is currently executing in.
CONT
register, which holds a pointer to the chain of
saved continuations of callers.
TEMPLATE
register, which holds a pointer to a special
data structure associated with the procedure that is
currently executing, and
PC
(program counter) register, which says which
instruction we are currently executing. (If we're compiling
to normal machine code, this is a special register built into
the CPU for this purpose, and we use it pretty much like any
other code would. If we're compiling to an interpretive
executable code, this is probably variable in the interpreter.)
The eval stack is used for holding values that have been computed by evaluating subexpressions, but not yet used or bound.
In evaluating the expression (+ foo 22)
, the three values will
be computed. When each value is computed, it will be left in
the VALUE
register. We evaluate right to left, and after
evaluating each argument, we perform a PUSH
operation on the
eval stack, which copies the value in the value register onto
the eval stack. When we get to the first subexpression (the
one that's supposed to return a function to call), we leave
the value in the value register, because that's where we put
the closure pointer for a procedure call.
The eval stack is used for two main purposes:
The eval stack is not used to hold intermediate values or local variables for suspended procedures--it isn't like the activation stack in a conventional implementation of C or Pascal. The values in the eval stack at any given time are only the intermediate values stored for the currently executing procedure. Intermediate values for suspended procedures are saved in the continuation chain as necessary.
When we call a procedure, the only values on the eval stack are the arguments to the procedure. Any other values used by the caller are moved from the eval stack into a continuation before calling.
The continuation chain is a data structure that fills roughly the role of an activation stack in the implementation of a conventional programming language. The continuation chain is a linked list of partial continuations, each of which is a record that stores the saved state of a procedure.
When a procedure performs a non-tail procedure call, it packages its important state information up into a partial continuation; this record saves the values of the environment, template, PC, and continuation registers, and any temporary values on the eval stack.
Once a caller has saved its state in a partial continuation, then the callee can do whatever it wants with the important registers and the evaluation stack. (This is called a caller-saves register usage convention, because the caller of a procedure is obliged to save any values that it will need when it resumes.)
Remember that continuations are allocated on the garbage collected heap and are immutable--we never modify a continuation once it's created. When we resume from a saved partial continuation, we copy the values from the partail continuation into the registers and eval stack, but that doesn't modify the partial continuation itself--it's still sitting out there on the heap. This is important for being able to implement call-with-current-continuation: it's what allows us to resume from the same continuation any number of times.
The compiler assumes that a binding environment is a chain of frames, each of which is a vector of slots which are the variable bindings. Each frame also has a static link or scope link field, which points to the frame representing the next lexically enclosing environment.
Top-level environments are represented specially, as hash tables that map names to bindings. We'll use a hash table instead of the association lists we used in our simple interpreter, because they're faster if you have a lot of bindings. A binding object for a top-level environment is pretty much the same as in the interpreter: a little vector with two important slots: a slot for its name and another slot that is the actual value field.
Local environments are represented very differently from toplevel environments: each frame is a vector of slots, and does not store the names of the bindings. It turns out that the names are only needed at compile time, so they don't actually have to be stored in the runtime environment. (The compiler also turns out to be able to do most of the work for looking up a toplevel variable at compile time, so the speed of our hash tables is not going to be critical to our runtime speed.)
Closures in our system are represented as objects with two fields: a pointer to the environment captured by the closure, and a pointer to an object called a template, which in turn contains a pointer to the code for the procedure.
When we evaluate the following expression
(let ((foo 22) (bar 15)) (lambda (...) ...))
we'll create an environment frame to hold the bindings of foo and
bar, and initialize them to 22
and 15
, respectively.
This environment frame will have a scope link pointing to the environment
we were executing in when we entered the let
. Inside this environment,
we'll create a closure. The closure will hold a pointer to the
new environment, and a pointer to a template object representing
the anonymous procedure being closed. The template will have a pointer to
the actual executable code. All of these things will be heap-allocated
objects, and in our implementation we'll give each one header field
showing what kind of object it is:
<to envt. frame for enclosing scope> ^ | +----------+ | | envt. fr.| | ,------>+==========+ | / scope | +----+----+ +----------+ / +----------+ | closure | / foo | 22 | +==========+ / +----------+ envt | +----+--' bar | 15 | +----------+ +----------+ proc | +----+--. +----------+ \ +----------+ \ | template | +----------+ `--->+==========+ | code | code | +----+-----> +==========+ +----------+ |executable| | | + code for + +----------+ |procedure | | | + + +----------+ | ... | | ... | +----------+ +----------+
The template object holds not only the pointer to the actual code, but any other handy values that the compiler can compute or look up at compile time, and which should be available to the procedure at run time. We'll discuss that more later.
When we want to apply a procedure to some argument values, we put the
argument values on the eval stack, and a pointer to the closure we want
to call in the VALUE
register. Then we execute a short sequence of
instructions that does the call:
Thus actual machine code for our "apply" operation in our intermediate representation is just a handful of instructions that do this stuff--a stereotyped little instruction sequence that destructures a closure and puts the appropriate values in registers before beginning execution of the procedure.
Because this is the way the procedure calling convention works, we know that when we begin executing the code for a procedure, the environment register will point to the right environment (where the procedure was defined) and the template register will point to the template for that procedure. Any values stored in the template by the compiler can be fetched at compile time by doing an indexed load with the template register as a base.
Consider this procedure:
(define (foo x y) (list 'bar x y))
Here, the literal bar is needed by the procedure--there must be some
code in foo that will somehow fetch a pointer to the symbol bar. That's
what the template object is for. When this procedure is compiled, the
compiler accumulates a list of such literals, and when the template
object for the procedure is created, all of those values will be stored
into it. When the compiler generates code to fetch the symbol bar
,
it just looks at the symbol's position in the literal list (and thus its
offset in the template object), and generates code to do an indexed
load to fetch that value from the template at run time.
Remember that when we do a procedure call, we do not necessarily
save the state of the caller. For a non-tail call, the compiler
must generate code to save the caller's state plus code for the
actual call. For a tail call, there is no need to save the state.
Because of this, there isn't really a single "procedure call"
operation that saves the caller's state and invokes the callee.
There are two separate operations, save-continuation
and
apply
.
As mentioned above, the code sequence that performs a procedure
applicatin assumes that the pointer to the closure to be called is
in the VALUE
register. The procedure will leave its value in that
register when it returns.
save-continuation
is the operation that saves the state of the
currently executing procedure in a partial continuation, and
pushes it onto the continuation chain.
When pushing a continuations, it is important to save all of the values on the eval stack, except for the arguments to the procedure being called. Therefore, when generating code for a combination (procedure call) expression, the code to save the caller's state does not come just before the actual code to call the procedure. This would remove the arguments to the procedure from the eval stack. Instead, the continuation save comes just before the code that generates the argument values that will be passed to the procedure:
(save-continuation <label>) ; save everything else before computing args <compute argn> ... <compute arg1> <compute callee> (apply) <label>
that way, the arguments to the call (and nothing else) will be on the eval stack when the procedure is called, and when the procedure returns, it will restore the other values from the partial continuation onto the eval stack.
This separation of the saving and calling code looks especially funny for nested expressions that call procedures, but it makes perfect sense.
save-continuation
takes an argument which is the address of
the code to execute when the continuation is resumed. This
address is saved in the partial continuation, and when the
continuation is resumed, it will be branched to (put in the
PC
register).
Now that we have a more detailed idea how the registers, eval stack, and continuations work, here's an example:
(+ (- a b) (/ c d))
compiles to intermediate code something like:
(push-continuation "resume23") ; save cont for call to + (push-continuation "resume24") ; save cont for call to - (lookup-variable d) ; get value of d into value reg (push) ; push value of d on eval stack (lookup-variable c) ; get value of c into value reg (push) ; push value of c on eval stack (lookup-variable /) ; look up / (apply) ; call /, which is in value reg after lookup (push) "resume24" (push-continuation "resume25") ;save cont for -, incl. value of (/ c d) (lookup-variable b) ; get value of b (push) (lookup-variable a) ; get value of a (push) (lookup-variable -) ; get value of - (apply) ; call - (push) ; push returned value on top of restored e stack "resume25" (lookup-variable +) ; look up + (apply) ; tail call + "resume23"
Things to notice:
apply
, the called routine (or something it
directly or indirectly tail calls) will eventually do a procedure
return, and pop the latest continuation we pushed, restoring
anything that was on the eval stack at that point, and resuming
execution at label1. [ OOPS... fix this ]
(/ c d)
to the eval stack.
(+ (- a b)
(/ c d))
to be used in tail position. This code doesn't save a continuation
before the final call to +. If the expression is to be used in
non-tail position, we must generate slightly different code, which
will save a continuation that will resume after this expression.
[ where does this go? ]
Like compile-if
, compile-combo
generates labels as
necessary to be able to name the code where execution should be resumed
after a call--in the code it generates, it puts the label just before
the intermediate code instruction to resume, and the same label
in the call to save-continuation
that should resume there.
It is easy to generate unique labels for every resumable point
in a program. We just keep a counter of labels we've used so
far, and to create a new one we append the digits representing
this number to the string "resume"
, so that we get
"resume1"
, "resume2"
, and so on.
We can write a Scheme procedure, generate-label
, which keeps a
counter, and when given a string as an argument, returns the a new
string with the same characters plus the digits representing the
number in the counter. That way, we can use labels that start
with "else"
and "end"
to label the branch targets of an
if
expression, and labels that start with "resume"
to
represent the resumption points for continuation saving. This makes
the intermediate code we generate fairly understandable, while ensuring
that labels are still unique, and easy to use as assembler labels
when translating intermediate code to machine language.
To get reasonable performance for our system, we'll want to treat the top-level environment differently from local variable binding environments. We'll use a trick involving lexical scope to precompute most of the work done in looking up a local variable binding, and a different trick to make it fast to look up top-level variables.
As we said before, each local variable binding contour (e.g., the
bindings introduced by a let
, or by binding the args to a procedure)
is represented at run time as a frame with slots for each variable,
plus a scope link that points to the frame representing the enclosing
contour.
A top-level environment is likely to be large, and we will want to be able to access it in special ways. We will represent it as a hash table that maps symbols (variable names) to their toplevel bindings. The bindings themselves will be represented as objects, whose main function is to have one field that holds the current value of the variable. For simplicity, we'll make them self-identifying as well: not only will the names be used as keys in the hash table, but the binding objects will hold pointers to their names as well as values.
Keep in mind that this representation is just one that's convenient. A toplevel environment could be represented as any kind of table (e.g., an association list), but we want it to be reasonably fast to access even if there are thousands of top-level variables. (We'll use a special trick to make normal accesses to top-level variable bindings very fast at run time, but we want to make them reasonably fast at compile time as well, and hash tables are good for that.)
Suppose we evaluate the following expressions at the top level, to define and initialize a couple of variables:
(define quux "fubar") (define (double x) (+ x x))
This will modify the toplevel environment by adding bindings for
quux
and double
, in addition to what's already there:
+-------------------------------------------------------------------+ | | | | \|/ | +------------------+ | | toplevel env. | | +==================+ +----------+ | | cons | *----+-------->| binding | | +--------+---------+ +==========+ | | | | value | *----+------><closure for cons> | . . +----------+ | . . name | cons | | . . +----------+ | | | | | +----------+ | +--------+---------+ | binding | | | quux | *----+-------->+==========+ | +--------+---------+ value | *----+------>"fubar" | | | | +----------+ | . . name | quux | | . . +----------+ | . . | | | | | +----------+ | +--------+---------+ | binding | +----------+ | | double | *----+-------->+==========+ | closure | | +--------+---------+ value | *----+--------->+==========+ | | | | +----------+ envt | *----+-----+ . . name | double | +----------+ . . +----------+ proc | *----+---> ... . . +----------+
Several things to note:
"fubar"
would
really be an object itself as well. As usual, we selectively
abbreviate our pictorial representation to avoid cluttering things up.
let
you get
at them from inside the language, but that's not standard.)
When the compiler compiles code for a literal, like 'foo
or
"foo"
or 22
in the following expression,
(list 'foo "foo" 22)
it notices which literals the expression will need at run time, and ensures that those literals will appear in the template of the procedure. It keeps a list of literals needed by the procedure it's compiling, and after compiling the code for the procedure, it uses that list to construct the template that goes with the code.
If foo
is the first literal encountered, it might be put into
the list first, and assigned the first free slot in the template
(after the code pointer). "foo"
might be assigned the second slot,
and so on. In the terminology of language implementation, the
template acts as a literal frame, as well as holding the pointer
to the procedure's code.
After assigning a literal a position in the template, the compiler
can generate one or two instructions that can fetch the value out
of the template, by using the address of the template, adding
the offset of that slot, and loading from the resulting address.
Since the template address is guaranteed to be in the TEMPLATE
register, this is probably just a single indexed load instruction.
In pseudo-C, it might look like:
value_register = *(template_register + offset)
where offset is computed by the compiler and therefore is probably an immediate operand to the load instruction that loads the value into the value register.
Notice that here we're taking advantage of the fact that the compiler runs in our system, and generates code that's just data in our system. The code will run in the same heap, and the compiler can therefore just compute values and put them in the template, and they'll stay around until that code is executed. (Life gets a little more complicated if you want to generate code that will be loaded into a different system and executed there.)
[ Now we should explain that the literal-state
argument to
compile
is the list of literals seen so far in compiling
a procedure. The return value of compile
is intermediate
code that includes an updated literal-state
. ]
Because of lexical scoping, it is easy for the compiler to tell the difference between a reference to a top-level variable binding and a reference to a local variable. We'll talk about that in detail later, but for now just assume that the compiler knows the difference at the point where it generates code for a variable reference.
When the compiler generates code for a top-level variable, it can usually look up the binding of that variable in the environment that the code is being generated for--the expression that defines the variable has already been executed, so the binding already exists.
The compiler can therefore do the actual lookup at compile time, e.g., hashing into the hash-table that implements a toplevel environment and getting a pointer to the actual binding object that will be referenced at run time.
To make references to this object fast, the compiler can simply put this pointer in the template for the procedure being compiled, as though it were a literal value.
Be clear on what's going on here: the compiler can't look up the value of the variable, because that's not known until the moment the variable is referenced at run time. (Before the code is executed, some other piece of code might run and change the value stored in the binding.) But the identity of the binding itself is known, and can be stashed in the literal frame.
Actually, it's just slightly more complicated than this. A variable can be used in a procedure definition before the variable itself is defined. (This is called a "forward reference.") To get around this, the compiler "cheats," and goes ahead and creates the binding object and its entry in the toplevel environment before the definition of the variable is actually encountered. At the language level, the variable hasn't been bound or given a value, but we go ahead and create the unique binding object and use it in compiling other expressions. For error checking, we put a special value in the binding to indicate that the binding isn't "real" yet--we put a reference to some object we consider "not a real value," so that we can detect uses of a variable that isn't really bound.
(In a system designed for maximum safety and early error checking,
we could ensure that each reference to a toplevel variable would
check for this value, and signal an error if it's found. If we're
not quite so concerned with early error checking, we can wait until
somebody attempts to use such a value, e.g., by adding it to
something, or taking the car
of it, and we rely on the normal
type checking of the language to tell us something's wrong at the
point that operation occurs.)
Now consider compiling a procedure like
(define (make-foo-list) (list 'foo "foo"))
The compiler will accumulate a list of toplevel bindings and literals
needed for the procedure, namely a string "foo"
, the symbol
foo
, and toplevel binding of the symbol list
. These will
be put in the template for the procedure, in the first, second, and third
slots after the code pointer. The code generated for this procedure
(assuming right-to-left evaluation) will be something like:
(fetch-literal 1) ; get string "foo" from template slot 1 (push) ; push it on eval stack (fetch-literal 2) ; get symbol foo from template slot 2 (push) ; push it on eval stack (fetch-literal 3) ; get toplevel binding of list from template slot 3 (t-l-bdg-get) ; extract value from binding (apply) ; (tail-)call list
Notice, of course, that we've made our intermediate code representation
more concrete now--we use slot numbers as the arguments to fetch-literal,
and we explicitly get the value of the toplevel variable from the toplevel
binding object in the value register. For setting the value in a binding,
we'll use a different intermediate code instruction, t-l-bdg-set!
(t-l-binding-set!
expects the value register to hold a pointer to
a toplevel binding object; it extracts the value of the binding, and
leaves that value in the value register.)
[ Now we can explain more about literal states--we accumulate a list of literal values and top-level variable bindings that must be accessible when the procedure runs. ]
By now it should be very clear how you would translate each of these little operations in our intermediate representation into a few assembly-language instructions.
[ need picture? ]
We can't really look up local variable bindings at compile time the way
we can toplevel bindings--local variable bindings don't exist yet when
we're compiling the expressions that create and use them. (Consider the
fact that every time you enter a let
, or call a procedure that binds
arguments, a new binding environment frame is created. Code that executes
in such environments will see a different binding environment frame in
the environment register each time it runs.)
What we can do is take advantage of lexical scope to precompute most of the search for a variable in an environment.
Consider the execution of this procedure:
(lambda (x y) (let ((a <some-expression> (b <some-expression>)) (list a b x y)))
When we compute the arguments to the call to list
, it's obvious that
the first and second variables (a
and b
) will be in the first
and second slots of the first binding environment frame, pointed directly
to by the ENVT
register. This is the environment created by the
let
. The third and fourth variables (x
and y
) will
be in the next environment frame, pointed to by the scope link of the first.
The compiler can compute the lexical address of each variable binding at the point where a reference to it is's compiled--that's the relative location of the variable starting from the environment register. A lexical address has two parts: the number of environment frames to skip to find the right frame, and the offset of the binding in that frame. In the above example, the lexical addresses are:
a: 0,1 b: 0,2 x: 1,1 y: 1,2
(We use the convention that frame numbers start at zero, but slot numbers appear to start at 1 because the scope link is in slot 0.)
The code generated for the reference to a can simply be an indexed load instruction, using the environment register plus an offset to grab the value in the first variable binding slot. In pseudo-C, that's something like
value_register = *(envt_register + offset)
where offset is probably 4 (bytes) to index past the scope link slot. Slightly more abstractly, its lexical address is [ WHAT? ]
The code for the reference to the variable y
would involve one level
of indirection--first the scope link pointer must be extracted from
the first environment frame, and then that can be used for an indexed
load to get the value of the second slot of the second frame:
value_register = *(envt_register) ; get ptr to 2nd envt frame value_register = *(envt_register + offset)
where offset is probably 8 (bytes) to index past the scope link
and the binding of x
.
Given this scheme, accessing a local variable takes time proportional to the number of environment frames intervening between between the expression being compiled and the environment where the referenced variable is defined. That's usually fairly fast, for two reasons:
For these reasons, most references to local variables will take between one and five instructions. To a first approximation, the time to reference local variables can be regarded as a small constant. (A slightly snazzier compiler can reduce this by using more registers, and leaving many values in registers instead of pushing and popping them from the eval stack, but that's a more advanced technique than we want to address here.)
Computing lexical addresses is very easy for the compiler. All it needs to do is maintain a data structure called a compile-time environment, which records the structure of the runtime environment.
Each time the compiler compiles an expression that creates new bindings,
it extends the compile-time environment to reflect the change to the
environment structure, and when compiling expressions that will
execute in that environment, it hands the new compile-time
environment to the recursive call to compile
which will compile
that expression.
For example, when compiling a let
, the compiler dispatches to
compile-let
, the analogue of eval-let
, which does four
things:
let
, so the
compile-let
uses the environment is was given when
making recursive calls to compile
to generate the
initial value code.
let
will execute in a new scope
including the new bindings.
compile-sequence
to compile the body of the let
,
passing it the new compile-time environment.
Just as the overall recursive structure of the compiler closely resembles the recursive structure of the interpreter, the role of the compile-time environment is very much like the role of the environment in the interpreter.
When the interpreter (compiler) evaluates (compiles) subexpressions that execute in the same environment as their parent expressions, it hands the recursive invocation the same environment it was given. When the interpreter (compiler) evaluates (compiles) an expression in a new environment, it hands the recursive call the new (compile-time) environment.
The structure of the compile time environment at any point in the compilation process mirrors the structure of the runtime environment where the code will execute. Unlike the interpreter's representation of the environment, however, the compile-time environment doesn't contain actual bindings--it can't, and it doesn't need to.
In effect, we split the interpreter's environment into two parts with parallel structure. Where the interpreter's environements are chains of frames holding name-binding pairs, the compiler splits those into two chains of frames: the runtime environment, whose frames hold the actual bindings, and the compile-time environment, whose frames hold the corresponding names.
Consider the expression
(let ((x 1) (y 2)) (let ((foo 3) (bar 4)) (list foo bar x y)))
At the point where (list a b x y)
is compiled or executed, the
environment for an interpreted system appears as shown on the
left, below, while the environments for a compiled system appear
as shown on the right:
INTERPRETED COMPILED COMPILED (compile time) (run time) /\ /\ /\ | | | +---------------+ | +----------+ | +----------+ | | envt. frame | | | c.t.e.fr.| | | envt. fr.| | +===============+ / +==========+ / +==========+ / | +----+-' | +----+-' | +----+-' +-------+-------+ +----------+ +----------+ | x | 1 | | x | | 1 | +-------+-------+ +----------+ +----------+ | y | 2 | | y | | 2 | +-------+-------+ +----------+ +----------+ _ _ _ |\ |\ |\ \ \ \ \ \ \ +---------------+ | +----------+ | +----------+ | | envt. frame | | | c.t.e.fr.| | | envt. fr.| | +===============+ / +==========+ / +==========+ / | +----+-' | +----+-' | +----+-' +-------+-------+ +----------+ +----------+ | foo | 3 | | foo | | 3 | +---------------+ +----------+ +----------+ | bar | 4 | | bar | | 4 | +-------+-------+ +----------+ +----------+
Note that there is a many-to-one relationship between the compile-time
environments and the run-time environments: a let
or lambda
expression is compiled once, and the corresponding environment frame is
created and passed to the recursive calls that compile subexpressions.
The code may be executed many times, however, and each time a run-time
environment frame will be created so that the code for subexpressions
can be executed in that environment.
Taking it step by step, let's look at the compilation of the expression
(let ((x 1234) (y 3456)) (let ((foo z)) (+ (- foo x) (+ bar y))))
goes as follows. (We'll assume that this expression occurs at the top level.)
Since we're compiling a top-level expression, we compile it in a compile-time environment that corresponds to the top-level environment. Toplevel environments are treated specially, because they're not created dynamically the way local binding environments. There's a one-to-one relationship bewteen top-level compile-time environments and the corresponding run-time environments. We'll represent the top-level compile-time environment as a special kind of environment frame, which really just holds a pointer to the top-level runtime environment so that top-level variables can be looked up.
So we start in a top-level environment, which we'll depict
as [top-level]
; we hand this to compile
along with
the expression to compile. compile
is the main dispatch that
analyzes the expression; in this case, it analyzes it and dispatches (via
compile-list
and compile-special-form
) to compile-let
,
the specialized procedure for compiling let
expressions.
compile-let
compiles the initial value expressions for x
and y
using compile-multi
, which in turn calls compile
recursively; they are compiled in the (top-level) environment, which is just
passed along because no new environments have been created yet.
In this case it doesn't matter, though, because they're just
literals. (The values 1234
and 3456
get added to the literal
list at this point.) Then compile-let
generates the code to bind
x
and y
.
So far, the code generated looks like:
(fetch-literal #1) ; fetch 1234 (push) ; push it on eval stack (fetch-literal #2) ; fetch 3456 (push) ; push it on eval stack (bind 2) ; bind 2 vars (x and y), w/values form eval stack
and the literals list holds 1234
and 3456
.
compile-let
then calls compile-sequence
to compile the
body of the let
, but first it creates a new compile-time environment,
to represent the fact that the body sequence will execute in the new
runtime environment after x
and y
have been bound.
The structure of this environment is
[ x y ] -> [toplevel]
(This is our new, terse way of drawing the box-and-arrow data structure for compile time environments. I got tired of drawing ascii art.)
compile-sequence
calls compile
recursively to evaluate a
sequence of expressions; in this case, there's only one expression
in the body.
The recursive call to compile
dispatches (again via
compile-list
and compile-special-form
to compile-let
,
to compile the inner let
.
compile-let
compiles the initial value expressions using
compile-multi
. compile-multi
calls compile
recursively to compile the one expression in the list, the symbol
z
. (Again, the same environment is just passed along, because
we haven't created a new environment.)
The recursive call to compile now dispatches to compile-symbol
,
which looks up the binding information for the symbol z
in the
compile-time environment. There's no binding in the first frame
(containing x
and y
), so the search goes to the second frame,
which is the top-level environment, and the top-level binding
is returned. This causes a dispatch to
compile-
toplevel-
var-ref
,
which adds the toplevel binding of z
to the literals list and
generates code to get it from the template and extract its value
at run time.
Then compile-let
generates code to bind the fetched value as the
local variable foo.
The code generated so far is:
(fetch-literal 1) ; fetch 1234 (push) (fetch-literal 2) ; fetch 3456 (push) (bind 2) ; bind 2 values (x and y) (fetch-literal 3) ; get toplevel binding (of z) (t-l-bdg-get) ; get value from (z's) binding (push) (bind 1) ; bind one variable (foo)
and the literals list contains 1234
, 3456
, and the binding
of z
.
Now compile-let
creates a new compile-time environment to represent
the environment created by the inner let
; its structure is
[ foo ] -> [ x y ] -> [toplevel]
and it passes this to compile-sequence
to compile the body of the
let. compile-sequence
calls compile
recursively once, handing
it the new environment, to compile the one body expression,
(+ (- foo x) (+ bar y))
.
The recursive call to compile
dispatches (through
compile-list
) to compile-combo
, which recursively calls
compile
three times, to generate code for the subexpressions
(+ bar y)
, (- foo x)
, and +
. Since no new bindings
are being created, the recursive calls are given the same environment.
The recursive call to call (+ bar y)
similarly dispatches to
compile-combo
and compiles y
, bar
, and +
.
Each of these calls dispatches to compile-symbol
and the variables
are looked up in the compile-time environment. The lookup for y
returns a lexical address of 1,2, so the intermediate code generated is
(local-var-ref 1 2)
The lookup for bar doesn't find any local bindings and returns the toplevel binding so the binding is added to the literal list and the intermediate code is
(literal-lookup 4) (t-l-bdg-get)
Similarly, the lookup for +
doesn't find any local bindings and
returns the toplevel binding, so the binding is added to the literal
list and the intermediate code is
(literal-lookup 4) (t-l-bdg-get)
now the call to compile-combo
that compiles (+ bar y)
can
string these three fragments together to get
(save-continuation "resume26") ; save state for call to + (local-var-ref 1 2) ; look up y (push) (literal-lookup 4) ; get toplevel binding (of bar) (t-l-bdg-get) ; get value from bdg (of bar) (push) (literal-lookup 5) ; get toplevel binding (of +) (t-l-bdg-get) ; get value from binding (of +) (apply) ; call + "resume26"
and return that. Notice that for the argument subexpressions,
compile-combo
inserts (push)
es to save the values on the
eval stack. For the first (function position) subexpression, it leaves
the value in the value register, which is where it's expected
(by apply
).
The recursive call to compile-combo
to compile (- foo x)
goes pretty similarly to the one for (+ bar y)
, the main
difference being that both foo
and x
are found to be
local variables and compiled appropriately, with the
result being the sequence
(save-continuation "resume27") ; save state for call to - (local-var-ref 1 2) ; look up x (push) (local-var-ref 0 1) ; look up foo (push) (literal-lookup 4) ; get toplevel binding (of -) (t-l-bdg-get) ; get value from binding (of -) (apply) ; call - "resume27"
The recursive call to compile the symbol +
goes striaghtforwardly
to compile-symbol
, which looks up +
and finds that it's
a toplevel variable; the binding is already on the literals list, so the
code generated is:
(literal-lookup 5) ; get toplevel binding (of +) (t-l-bdg-get) ; get value from binding (of +)
and this is returned to the outer invocation of compile combo
.
It can then string together the code for the outer +
expression,
putting a save-continuation
at the front and adding an apply
at the end. This code is returned to the inner invocation of
compile-let
, which appends it to its code and returns it to the
outer invocation of compile-let
, which returns the entire
code sequence
(fetch-literal 1) ; fetch 1234 (push) (fetch-literal 2) ; fetch 3456 (push) (bind 2) ; bind 2 values (x and y) (fetch-literal 3) ; get toplevel binding (of z) (t-l-bdg-get) ; get value from (z's) binding (push) (bind 1) ; bind one variable (foo) (save-continuation "resume26") ; save state for call to + (local-var-ref 1 2) ; look up y (push) (literal-lookup 4) ; get toplevel binding (of bar) (t-l-bdg-get) ; get value from bdg (of bar) (push) (literal-lookup 5) ; get toplevel binding (of +) (t-l-bdg-get) ; get value from binding (of +) (apply) ; call + "resume26" (save-continuation "resume27") ; save state for call to - (local-var-ref 1 2) ; look up x (push) (local-var-ref 0 1) ; look up foo (push) (literal-lookup 4) ; get toplevel binding (of -) (t-l-bdg-get) ; get value from binding (of -) (apply) ; call - "resume27" (literal-lookup 5) ; get toplevel binding (of +) (t-l-bdg-get) ; get value from binding (of +) (apply) ; (tail-)call +
One very important thing we glossed over just now when describing
the workings of the compiler was when exactly to save continuations,
and when not to. It is important to save continuations before
calling procedures, if the callee must return and resume the
execution of the caller. This is not necessary for tail calls,
and in fact Scheme requires that continuations not
be saved
for tail calls--if we save continuations before tail-calls
that happen to implement loops, we may end up with an unbounded
accumulation of unnecessary continuations.
Another important question is when returns should be executed. If a procedure ends in a tail-call, it is assumed that the callee will do a return. But eventually something actually has to do a return, and pass control back to its caller (or the caller of its caller... whatever). This situation occurs when the tail expression of a procedure is not another procedure call, e.g., returning the value of a variable or a literal.
The general rule is that if a procedure call is the last thing a procedure does, no continuation should be saved--we can just jump into the callee, and since our state was not saved, the callee's return will resume our caller. To get this right, we just have to keep track of which expressions are being compiled in "tail position."
In the procedure
(lambda (x) (if (foo x) (bar (quux x)) (baz)))
the if
expression is in tail position, because the value of
the if will be returned as the value of the procedure. The
condition expression (foo x)
is not in tail position, because
after it is executed, control must return to this procedure
so that either the consequent expression (bar (quux x))
or
the alternative expression (baz)
can be executed.
Note that both the consequent and the alternative expressions are in tail-position; whichever is executed, that will be the last thing this procedure does, and the value computed will be the result of this procedure.
On the other hand, if we modify the procedure to always
return #f
, none of these expressions is in tail position.
(lambda (x) (if (foo x) (bar (quux x)) (baz)) #f)
That's because now the expression #f
is in tail position,
not the if
expression; whatever the if does, control must
come back to this procedure so that the value #f
can be
returned.
Notice that the values to compute the arguments of a combination (procedure call) are never in tail position--after computing them, control must always come back so that the procedure can be applied. The combination itself may be a tail call, of course, in which case once the arguments are computed, the apply may happen and control may never return.
To get this kind of right, all that is necessary is that each
recursive call to compile should know whether the code being
compiled is going to be used in tail position or not; for this
we use a compile-time continuation. (Fear not--it's simpler
than compile time environments. It's really just a flag that
gets passed along to recursive calls to compile
, sometimes
getting turned off along the way.)
Keep in mind that tails of tails are in tail positions, but
non-tail subexpressions are not. So in the case of nested if
's
where the outer if
is in tail position, only the consequent
and the alternative of the consequent and the alternative
are in tail position, e.g., in
(lambda () (if (if (a) (b) (c)) (if (d) (e) (f)) (if (g) (h) (i)))
the tail calls are (e)
, (f)
, (h)
, and (i)
.
All of the calls in the first inner if
must return, because the
value returned must be used by the outer if
. The calls to the
condition expressions in the other two inner ifs must also return, because
the values must be used to tell which of their alternative and
consequent to use.
For each basic kind of expression, we can tell which subexpressions should be considered tails if the overall expression is:
let
, the initial value expressions for bindings
are never tails, and the body is just a sequence, whose last
subexpression can be a tail.
When we compile something in tail position, we pass compile
a flag saying so. The flag will be examined, and passed along
to subexpressions if appropriate for compiling the kind of
subexpression in question.
For example, if compile-sequence
is handed a flag saying it should
compile for tail position, it will pass the tail flag along when
calling compile
recursively on its last subexpression. For its
other subexpressions, however, it will always pass the non-tail
flag, because they must always return to execute the next expression
in the sequence.
Similarly, compile-if
will pass whatever flag it is given along to
when calling compile
for its consequent and alternative subexpressions,
but never when compiling its condition expression.
compile-combo
will always pass along a non-tail flag when
calling compile on its subexpressions, but will examine the flag
it's given to tell whether it should save a continuation before
evaluating all of them.
compile-lambda
will always compile body expressions in non-tail
position, except for the last one, which is always compiled in
tail position. (For simplicitly, compile-lambda
just hands the
whole body to compile-sequence
, with a tail flag.)
compile-let
, always compiles its initial value expressions in
non-tail position, and its body expressions like a sequence. (For
simplicity, it just hands the whole body to compile-sequence
,
with whatever flag it's given.)
As mentioned above, when an expression other than a procedure
call is a tail of a procedure, it must be accompanied by a
return. That is, every tail of a procedure must be either
an apply
(which will call something which will return, perhaps
indirectly because of tail calling) or a return
.
The compiler can handle this by putting ensuring that wherever
we generate intermediate code that is a leaf of the expression
graph (e.g., in compile-variable-ref
and compile-literal
),
we check the compile-time continuation flag to see if the
expression occurs in tail position. If so, rather than simply
leaving the value in the value register, we also execute a return
sequence--a series of instructions that will grab the values
out of the first partial continuation on the chain, and restore
them into the registers and evaluation stack to resume the
suspended procedure. We have a special intermediate code
instruction that stands for this sequence, called return.
Consider the following procedure:
(lambda (a b c) (if (if a (b) c) d (e)))
When compiling its body, we dispatch through compile-sequence
and recursively call compile
to compile the if
in tail
position. It recursively calls compile
to compile the nested
if in non-tail position, which in turn recursively calls
compile
to compile a
, (b)
and c
in non-tail
position.
Note that a
is a leaf expression, and since it's in non-tail
position, it can just leave its value in the value register.
The subsequent code (the test for false and conditional branch
that's part of the code for the inner if
) will expect that value
there, so that's fine.
The expression (b)
is not in tail position, because it inherits
non-tail position from the inner if
, so a continuation must be
saved before the call to b
. When b
returns, its value will be
in the value register and execution will resume at the branch
that is part of the if
.
Similarly, the expression c
is in non-tail position (which it
also inherited from the inner if); it can just leave its value
in the value register where subsequent code can find it. (In
this case, it's the value returned by the inner if
, and tested
by the outer if
's test for false and conditional branch.)
The expression d
is different. It's in tail position, and it's
a leaf (not a call). It can't just leave it's value in the
register, because it's the end of the procedure; it must
therefore have a return
sequence tagged onto it.
The expression (e)
is just a tail call, which can just call
e
without saving a continuation. Whatever e calls can do whatever
it wants, and probably something will eventually leave something
in the value register and pop the caller's continuation.
The code generated for the above procedure is:
(bind 3) ; bind args (a, b, and c) (local-var-ref 0 1) ; get value of a (push) (branch-on-false "else32") ; compare and maybe br to inner else (save-continuation "resume33") (local-var-ref 0 2) ; get value of b (apply) ; call b "resume33" (branch end) "else32" (local-var-ref 0 3) ; get value of c "end32" (branch-on-false else1) ; compare and may br to outer else (fetch-literal 1) ; get toplevel binding of d (t-l-var-get) ; get value of d from binding (return) ; and return it (branch end1) "else31" (fetch-literal 2) ; get toplevel binding of e (t-l-var-get) ; get value of e from binding (apply) ; and tail-call it "end31"
(Notice that when we generated the code for the outer else, we
generated a branch that can never be taken. compile-if
generates a label for the end of the code, so that after executing the
consequent, control will resume at whatever code follows the if
.
In the case of this if
, the consequent will always execute a
return before encountering the branch. A slightly smarter compiler
would probably recognize this situation, and eliminate the branch.)
We said earlier that the compiler mainly uses recursion to generate intermediate code for nested expressions. To make this useful, though, at some point the intermediate code for a top-level expression must be converted into actual executable code and packaged up so that it can be called.
Suppose we interact with our system via a read-eval-print loop where eval is really implemented by compiling the expression and executing the resulting compiled code.
To make it easy to implement this, it's nice if there aren't very many kinds of top-level expressions that the compiler has to generate code for and be able to actually call. In particular, it's convenient if different kinds of expression can be transformed into the same kind of expression. The easy way to do this is to make all different kinds of executable expressions into expressions that generate procedures, and then call those procedures.
If we type
(let ((x 2)) (+ x x))
to the r.e.p. loop, the r.e.p. loop can simply wrap this up in a procedure expression compile that and package it up as something executable, and call it. In effect the read-eval-print loop will convert it to
(lambda () (let ((x 2)) (+ x x))
before compiling it, and call the resulting closure to execute it.
Likewise, expressions like
(set! foo quux)
and
(if bar baz)
can be wrapped up as
(lambda () (set! foo quux))
and
(lambda () (if bar baz))
Now when we start compiling, we only have to deal with one kind of thing--a whole procedure, and when we get the resulting code back and package it up to run it, we'll always be dealing with the code for a whole procedure. That makes it easy to create an actual closure to call.
The main routine we use to start off compilation is compile-procedure
,
which takes an expression, a compile-time environment, a compile-time
continuation, and a literal list as arguments. It returns
intermediate code and an updated literal list for the procedure.
We take the intermediate code and hand it to the procedure
intermediate-code->executable-code
which generates the executable
code object. (This may be by translating the sequence of intermediate
code instructions into the equivalent sequences of assembly language
instructions, and running that through an assembler. Before doing
the assembly, it may run the intermediate code through one or more
optimization phases.
We take the resulting executable code and the literals list, and
hand them to make-template
to create the template object.
Now we can hand the appropriate runtime environment and the
template to make-closure
and get back a callable closure for the
procedure.
lambda
Expressions Inside Procedures
When we compile a lambda
expression, we must generate code that
will create a closure at run time. A very naive way to do this
would be to generate code that would call the compiler at runtime
to compile the lambda
expression into a procedure, plus a little
code to create a closure of that object in the runtime environment.
Luckily, this is not necessary, and the compiler can do all of
the real compilation at compile time--since the code for the
lambda
expression will be the same every time it's executed,
and since lexical scope guarantees that it will always execute
in an environment with the same structure, only one version of
the code is needed, and it can be shared among all closures of
that procedure. The template can be shared as well.
The compiler therefore generates code and a template for the lambda
procedure; at run time, the actual code for the lambda
expression
just makes a closure on the heap and initializes its environment
pointer and template pointer. This code will get the environment
pointer from the environment register (and put it in the environment
field of the new closure); the template pointer will be the ponter
to the template for the lambda
procedure.
To allow this little code sequence to quickly grab the template
for the procedure being closed, the compiler stores a pointer to
that template in the template of the procedure which executes
the lambda
expression. For example, if a lambda
expression is encountered while compiling procedure foo
,
the compiler will compile the lambda
procedure and store
its template in the template of foo
. (While compiling foo
,
it simply records the pointer to the new lambda
procedure's
template as another literal. Then it will
end up in foo
's template like other literals.)
So the code generated for
... (lambda (x) (...)) ...
looks like
... (envt-reg-get) ; primitive to copy envt. reg. onto eval stack (push) (fetch-literal 15) ; grab template pointer for lambda proc (push) (make-closure) ; code that will create closure w/those values ...
The real trick is in compiling the lambda
procedure and stuffing
its template into the template of the procedure that contains
the lambda
expression. The compiler just calls itself to generate
the code and template then saves the template in the literal list
and generates code like the above to reference the right literal.
We said above that we can deal with top-level expressions by turning
them all into lambda
expressions, which will have the effect of
evaluating those expressions when called.
This is a little bit tricky when dealing with top-level definitions, because top-level definitions can't be nested inside lambda expressions in the obvious way--they'd just become local definitions.
We therefore have to treat them specially. We use a special procedure,
environment-define!
, which operates on top-level environments and
allows us to create top-level bindings. This is not a standard
Scheme procedure--it's a special procedure that the compiler can
generate calls to, but normal portable Scheme code cannot use directly.
The read-eval-print-loop will therefore recognize top-level
definitions and treat them specially. When it encounters one,
the initial-value expression will be wrapped up as a lambda
and
compiled, and the results turned into code, a template, and a
closure. (The closure is given the runtime toplevel environment
pointer.)
The closure will be called to get a result for the initial value
expression, and environment-define!
will be used to create and
initialize the toplevel variable.
(This might appear at first to cause a scoping problem: if the
binding isn't created until after the initial value expression is
compiled, the compiler won't see the binding. But recall that if
we compile an expression that uses an undefined variable, we assume
it's a toplevel variable and create a binding object for it, and
mark that object invalid. If the binding has already been created
by a forward reference in this way, environment-define!
will
just overwrite the marker with a real value.)
Of course, if the top-level definition uses procedure definition
syntax, it is necessary to massage that into a lambda
expression
before doing the above massaging and compiling.
In order to support garbage collection (as is required for Scheme), there must be some agreement between the compiler and the garbage collector, so that the collector can find the pointers that the running program might find, and ensure that all objects the program could reach from them are preserved.
A common way of doing this (used in RScheme and Scheme-48) is to have a fixed set of registers (plus maybe an eval stack) that hold all of the root values that the collector needs to know about, and guarantee that whenever garbage collection occurs, all pointers will be identifiable as such. Any given register must be known to never contain pointers, to always contain a pointer, or to contain self-identifying (tagged) values that are decodable to see if they're pointers.
For example, in the straightforward compiled system we've described
in detail, the VALUE
register and the EVAL
stack only
contain normal Scheme values: tagged values that can be checked to see
if they're pointers. On the other hand, the template and procedure,
pointers would probably always contain raw pointers, since they can
only point at one kind of thing, and the tags would slow some things
down.
There might also be some other registers, which always contain nonpointers.
Many systems (like RScheme and Scheme-48) ensure that garbage collection can only happen when a program is at a well-defined "safe point", not at an arbitrary point in the code. At a safe point, all pointer values are known to be recognizable. In between safe points, it's okay for the code to use values that aren't properly decipherable by the GC. (For example, a register that normally contains only tagged values might briefly hold a raw pointer.)
This is easy in a single-threaded system; the GC just keeps some space in reserve, so that it never runs out of memory between safe points. If an allocation requires dipping into this reserve, a flag is set so that a GC will occur at the next safe point.
The usual trick is to ensure that each procedure call and backward branch is a safe point. This ensures that the a program (or thread) reaches safe points periodically,
It's a little bit trickier in a multithreaded system--you have to make sure that you suspend threads at safe points, so that if another thread forces a GC while another thread is suspended.
Some systems do not use safe points, and in effect make every point in the code a safe point for collection. They ensure that no matter where a GC occurs (or where a thread was suspended before the GC occurred), there will be enough information lying around so that the collector can find all of the pointers it needs to find.
Some compilers do this by restricting the way registers are used and code is generated. (For example, the Orbit compiler only uses certain registers to hold pointers, and only uses certain others to hold nonpointers. In addition, all pointers in registers must point directly to the beginning of an object; array indexing cannot be converted into arbitrary ponter arithmetic by the optimizing compiler.)
Other compilers allow more use of odd representations and more flexible use of registers, so that values can be figured out at run time. For example, a register might be assumed to hold nonpointers, except at points in the code flagged by the compiler, based on its having register allocated a variable there.
The system we've described so far generates fairly slow code, and a major culprit is the frequency of continuation saving and procedure calls. Even very small, frequently-executed procedures like eq?, car, cdr, and + require a handful if instructions to call and another handful to return, plus another handful to save a continuation if it's a non-tail call. This is much slower that the cost of similar operations in conventional languages like C or Pascal; in those languages, these simple little "operations" don't have the semantics of calls to first-class procedures.
Sometimes it is desirable to trade away some of the purity and
elegance of a language like Scheme, and trade reduced flexibility
for better performance. One way of doing this is by declaring
frequently-used small procedures not
to be redefinable, and allowing
the compiler to compile those operations inline rather than as
procedure calls. In some systems this only works for built-in
procedures that the compiler understands, but in others the compiler
is smart enough to inline user-defined procedures if so directed.
In some Scheme systems, you can declare procedures to be inlinable, or use a compiler flag that says you promise not to redefine the common little procedures that are most valuable to inline. This means that you can't change the definition of something like + on the fly, but you seldom want to. A common tradeoff is to avoid inlining any but the most frequently-called procedures during program development, and once the program is finished, recompile with lots of inlining. This gives you the flexibility to modify procedure definitions on the fly during debugging, while getting maximum speed once it's clear which procedures won't ever be redefined in normal operation.
Some high-tech compilers use advanced techniques to do lots of inlining when it's safe, without reducing flexibility much or requiring the user to supply a lot of declarations.
The Self compiler aggressively inlines code, and automatically recompiles the code that is invalidated by changes to procedure definitions. (This compiler is for the language Self, not Scheme, but similar techniques could be applied to Scheme.)
Some compilers currently in development have a special mode for compiling finished programs which will not be used with a read-eval-print loop. Such a compiler takes advantage of the fact that if it can look at the whole program (rather than having parts typed in by the user interactively), it can tell which variables could ever be modified at run time. (As long as there are no calls to eval at run time, the compiler can tell that all of the code for the program exists at compile-time; new closures may be created at run time, but not totally new procedures.) After globally determining that there is no code in the program that could change the definition of a procedure, it is free to inline the code for that procedure into its callers.
Another key performance problem with naive implementations of Scheme (or other dynamically typed languages) is that basic operations are generally slow relative to their execution in conventional statically-typed languages. For example, the Scheme procedure + must check the types of its arguments and (depending on those types) execute any of several possible code sequences to add two numbers. Usually, the checking overhead alone is several times greater than the cost of the actual addition.
One way of reducing this cost is by extending Scheme to allow the user to declare the types of some variables. The compiler may be able to use this information to compile fast versions of operations for values of known types. (This is especially true if common operations are inlined--the compiler can choose to inline the appropriate version rather than the more general code.)
Another way of reducing type checking cost is for the system to
automatically infer the types of some expressions. For example,
consider the expression (+ a 22)
. Since 22
is a literal,
its type is known at compile time. If the compiler can inline the
+
procedure, it may at least omit the type check of that argument.
A combination of declarations and inferencing can work well. For
example, if the user has declared variable a to be of type <integer>
,
then the compiler can tell that (+ a 22)
is an expression whose
arguments are integers (so no run time type test are necessary there)
and whose result is an integer, which may eliminate the need for
type checks by the expression that uses the value.
More aggressive schemes are possible for reducing the frequency of dynamic type checks. For example, the Self compiler aggressively inlines and transforms code so that multiple dynamic type checks can be collapsed into a single one.
[ blah blah ]
For example, it's very likely a good idea to use more registers, and either not have an eval stack or not use it as often. Our simple abstract machine requires arguments to be passed on the eval stack, which means storing into memory at least once for each argument, and loading back from memory when arguments are used. Most modern machines have several hardware registers available for argument passing, and more for holding intermediate values of computations.
If we have a few more registers that can be used for argument passing, we could just leave the argument values in those known registers, and procedures could expect them there. In many cases, argument values could be computed in a way that the result is left in the appropriate argument-passing register, without having to copy it there from somwhere else. Similarly, in many cases, procedures could leave their arguments in the argument passing registers and use them there, without actually copying them into a binding environment on the heap. (Even if only a few registers can be devoted to this, it will account for the large majority of arguments passed, since most procedure calls are to procedures that take between one and three arguments.)
Similarly, in many cases a temporary value generated by evaluating a subexpression could be left in a register, and then used by another expression, without pushing and popping the eval stack.
This can be a big performance win--it is much faster to operate on arguments and temporary values that are already in registers, rather than copying them to and from memory all of the time.
Using more registers can make the compiler and runtime system more complicated. If variables are in registers when continuations are saved, their values must be saved in the continuations and restored at procedure returns. This requires the compiler to keep track of which registers are in use at which points, and generate appropriate code. It also complicates the interface between the compiled code and the garbage collector; the garbage collector must be able to find all of the pointer values that are stored in registers, so that it can find all of the reachable objects. The compiler must therefore record sufficient information that all pointer values can be found at garbage collection time. (Alternatively, the compiler may record a safe approximation of the information, and require the collector to make conservative guesses about what's what.)
One of the performance problems with a naive implementation of
Scheme is that in the general case, variable bindings must be
allocated on the garbage-collected heap, and procedure calls
must be via pointers to closures. This is often much slower than
the usual implementation of conventional programming languages,
which don't have to support lambda
. Allocating closures and
environments on the heap is mainly slow because creating and
accessing variable bindings is slower than if the variables
were allocated on a stack or in registers.
A smart Scheme compiler can get rid of most of this overhead by analyzing programs and noticing that many closures are used in stereotyped ways, and calls to them can be implemented more cheaply than the naive implementation. Similarly, analysis of expressions may reveal that most binding environments can't possibly be captured by closures, and therefore don't need to be allocated on the garbage-collected heap. The bindings can be saved in continuations along with temporary values, or a more conventional stack may be used, or (best of all), the bindings can be register-allocated.
A simple example of a language-level closure that doesn't need
the fully general naive implementation is a closure created
by a lambda
expression that appears in the function position
of a combination:
((lambda (x) (+ x x)) 2))
(Recall that constructs like this are often generated by macros
that implement binding constructs like let
---this one is
equivalent to
(let ((x 2) (+ x x))
In this case, we can tell from the fact that the lambda
expression
appears in the function position that the closure can't "escape"
and have anything weird done with it. That is, no pointer to the
closure is assigned into a variable binding, or passed to a
procedure call, or inserted into a data structure. It's clear
that the only thing that can happen to this closure is that it
will be called, and then the pointer to it will be "dropped," i.e.,
not passed anywhere else. The closure will therefore become
garbage immediately after it's executed.
A smart compiler will therefore recognize that all the closure
really does is bind its variable and execute it's body; it will
leave out the code to create the closure and just compile in
the equivalent code--in this case, it will generate the obvious
code for a let
expression.
(Some compilers always transform let
's and letrec
's into
lambda
combinations, and rely on their optimizers to recognize the
unnecessary lambda
's and remove them. This may seem backwards,
but it's nice because the same optimizations work whether the
lambda combinations were the result of transforming a let
,
or macroexpanding a user-defined macro, or written directly
by the user, or whatever. The more sophisticated the optimizer,
the more simply the user can write macros and procedures, and
expect the compiler to sort it all out and generate efficient
code.)
Another simple case for closure and environment analyis is binding
environments that don't have any closures created in their scopes.
Suppose that our compiler inlines calls to car
, eq?
,
and cdr
, and consider the expression
(let ((x (car a)) (if (eq? (car x) target) (car (cdr x)) #f))
in this case, the body of the let
can be compiled into entirely
inline code, and it is clear that there is no possible path
of execution that can create a closure that captures x
. x
can therefore be allocated in a register for its whole lifetime,
making this code much faster.
[ separate section? Figure out structure here... ]
Actually, some of these analyses are trickier than they appear,
due to the presence of side effects and call/cc
.
[ Haven't talked about call/cc
yet! ]
Consider the expression, where we don't assume any inlining
(let ((x (car a))) (if (eq? (car x) target) (car (cdr x)) (set! x (foo))))
At first it appears that since there are no lambda
's in the
expression, x
can be allocated in a register, and saved in
continuations across calls. (E.g., when calling car
, we could
just save the value of x
in the continuation and have it restored
when car
returns, right?)
Unfortunately, if we don't have any guarantees that car
won't be
redefined in weird ways, then it's possible that the call will be
to procedure that will (directly or indirectly) call call/cc
,
and capture a continuation that could be used to return into this
procedure any number of times. In that case, we can't be sure that
we won't return into this code and modify x
. If we did, then each
time we returned into this environment, we should see the latest
value of x
. This will happen if the value of x
is in a
normal binding environment on the heap, but not if it's in a register
that gets saved in a continuation. Recall that when we restore a
continuation, we just copy the values out into the registers. If we
restore the same continuation multiple times, we'll just keep copying
the same value of x
back out.
To get this right, we have to ensure that if there are any assignments
to x
, then all references to x
go through a pointer to a
heap-allocated binding. Then when we save a continuation, we save this
pointer to the binding of x
, not the state (value) of the
binding of x
. Every time set or read the value of x
, we
go through this indirection to the same binding, and see the latest value.
Because of this, high-tech scheme compilers keep track of which
variables are ever set!
anywhere in their scopes, and make sure to
allocate those variables' bindings on the heap.
In Scheme, it is a common idiom to code iteration as recursion;
macros for different looping constructs often compile into letrec
's
with tail-calling lambda
expressions.
While this is a very powerful framework for expression various
patterns of iteration, a naive implementation is slow. In most
cases, loops created in this way are actually just used as loops,
and it is desirable to compile away the overhead of closure creation
and calling. For example, consider a named let
like
(let loop ((x 0)) <body> (if (< x 10) (loop (+ x 1))))
that has been transformed to
(let ((loop (lambda x) <body> (if (< x 10) (loop (+ x 1)))))) (loop 0)
We can look at this expression, and if no reference to the variable
loop
occurs in the <body> expression, we can tell that we can
compile it as a loop.
The analysis here is just slightly more complicated than the one
that allows us to optimize closures that are produced by lambda
expressions in function position of a combination.
When compiling the let
, we can keep track of each let
variable
and see whether it is ever used for as anything but the name of
a procedure to tail-call--if the value of loop is never assigned,
and never read except to call it, then we know that the "calls"
to loop don't really need to be full-blown closure calls at all.
We can inline the code for the body of the loop and compile these
calls as jumps directly to that code.
FOOD FOR THOUGHT--does it matter whether the calls are tail-calls or not, if we just treat them as procedure calls to a known address, and go ahead and save a continuation with the right label?
Notice that register closure analysis is particularly important
for loop control variables and variables for little let
's inside
loops. Because Scheme requires that a loop variable be bound
again (to fresh memory) at each iteration of a loop, actually
allocating them on the heap is expensive. If it can be determined
that the variable is dead at the end of the loop, however, then
the variable can be re-bound at each iteration by simply
re-using the same register. (We're binding the name to a particular
piece of memory--the register--over and over again, and it just
happens that these conceptual rebindings incur no runtime cost.)
With good closure analysis, loop conversion, and register allocation, a Scheme compiler can compile "normal" loops into code that's just as efficient as any compiler's.
Besides the optimizations described above, conventional compiler optimizations are applicable to optimizing languages like Scheme.
Just as in a FORTRAN or C compiler, data flow analysis and control flow analysis can let the compiler simplify intermediate code and produce better machine code.
Inlining and closure analyisis can greatly reduce the amount of heap allocation in a Scheme implementation. Allocating all binding environments and continuations on the heap may inflate allocation rates by an order of magnitude over the rate of allocation of normal data structures like pairs and vectors. With a simple compiler and garbage collector, this can greatly inflate garbage collection costs. Despite the high rate at which continuations and environments are allocated, there are typically relatively few of them live at any given time--the vast majority of them are used very, very briefly and then become garbage.
Inlining procedure calls may greatly reduce the allocation of continuations, and closure analysis may allow most bindings to be allocated in registers instead of on the heap.
Still, it may be desirable to keep most of the continuations and environments from making it to the normal garbage-collected heap.
A stack cache
is an area of memory (or pool of discontiguous chunks
of memory) that's used to for initial allocation of continuations
and/or binding environments, in the expectation that most of them
will die quickly. A stack cache caches part of the continuation
chain; it's called a stack cache because it behaves mostly like
a stack. Stack caches may be used for continuatons, with environments
still being allocated on the heap, or a more complex design may be
used to keep most environments from making it to the heap as well.
For the most part, a stack cache is treated like a stack, in that
continuations are pushed and popped as though it were a stack. When
a continuation is captured by call/cc
, however, the continuation
chain is first moved to the heap so that it can be preserved in the
usual way. This is generally a good tradeoff, because call/cc
is not typically executed very often, and the stack cache can behave like a
stack most of the time. The large majority of continuations will
be reclaimed very quickly, by popping the stack cache, while a
small minority will be moved out to the normal heap.
Caching binding environments is a little trickier, but the basic principle is the same; most environments are created in the stack cache, and only moved to the garbage-collected heap when necessary, i.e., when a closure is created on the heap. At that moment, the environment is moved to the heap, one frame at a time, until a frame is reached that is already on the heap. (The code that does this must ensure that an environment is never copied to the heap twice, destroying the sharing of outer environements by inner environments created in their scope.)
It is not clear how desirable a stack cache for environments is, given a compiler that does a reasonably good job of closure analysis. Using a stack cache for environments makes closure creation slower, and if most of the short-lived environments have been eliminated by closure analysis and register allocation, it may not be worth it.
There is also some controversy about whether stack caches are worthwhile in general, or whether a generational garbage collector will take care of the large volume of short-lived data efficiently.
One interesting point is that a stack cache really is a kind of generational garbage collection scheme, which exploits the typically short lifetimes of particular kinds of data. (When environments and continuations are moved to the normal heap, that can be viewed as moving objects from one generation to the next. This special generation is cheaper than a normal generational scheme, however, because of the stereotyped structures of continuation chains and binding environments.)
A stack cache, because it's small, can reduce the amount of memory that is used very frequently, compared to a generational GC without a stack cache. (A stack cache may only be a few kilobytes, but the youngest generation of a generational GC may be hundreds of kilobytes, or megabytes.) For some cache architectures, frequent reuse of this large an area causes significant cache miss penalties. (For some other architectures, the misses still occur but the cost is surprisingly low. I believe that stack caches are nonetheless a good idea, because they never hurt much and may sometimes help a lot.)
Scheme-48 has a stack cache that caches both continuations and binding environments. RScheme has a stack cache for continuations only, and relies on the compiler to compile away most heap allocation of binding environments. (This may not currently be as effective as it should be--the compiler needs more testing and improvement before it will generate really good code.)