An Introduction to Scheme and its Implementation

Go to the first, previous, next, last section, table of contents.

Symbols

Symbols are like strings, in that they have a character sequence. Symbols are different, however, in that only one symbol object can have any given character sequence. The character sequence is called the symbol's print name. A print name is not the same thing as a variable name, however--it's just the character sequence that identifies a particular unique symbol. It's called the print name because that's what's printed out when you display the object (or write it).

Unlike strings, booleans, and numbers, symbols are not self-evaluating. To refer to a literal symbol, you have to quote it. Since print names of symbols look just like variable names, you have to tell Scheme which you mean.

If we type in the character sequence f o o without double quotes around it, Scheme assumes we mean to refer to a variable named foo, not the unique symbol whose print name is foo.

In interpreters and compilers, symbol objects are often used as variable names, and Scheme treats them specially. If we just type in a character string that's a symbol print name, and hit return, Scheme assumes that we are asking for the value of the binding of the variable with that name--if there is one.

Scheme>(define foo 10)
#void

Scheme>foo
10

If we quote the symbol name with the single quote character, Scheme interprets that as meaning we want the symbol object foo.

Scheme>'foo
foo

Since we've already defined (and bound) the variables foo1 and foo2, we can ask Scheme to look up their values.

Scheme>foo1
"foo"
Scheme>foo2
"foo"

Here we've typed in the names that we gave to variables earlier, and Scheme looked up the values of the variables.

As we've seen before, this doesn't work if there isn't a bound variable by that name. Symbols can be used as variable names, if you define the variable, but by default a symbol is just an object with a particular print name that identifies it.

If we want to refer to the symbol object foo, rather than using foo as a variable name, we can quote it, using the special quote character '. This tells Scheme not to evaluate the following expression, but to treat it as literal data.

Scheme> 'foo
foo

When you type 'foo, you're telling Scheme you want a pointer to the symbol whose print name is foo. It doesn't matter whether there's a variable named foo or what its current value is---'foo means a pointer to the unique symbol object whose print name is foo, which has nothing to do with any variable foo.

The first time you type in a symbol name, Scheme constructs a symbol object with that character sequence, and puts it in a special table. If you later type in a symbol name with the same character sequence, Scheme notices that it's the same sequence. Instead of constructing a new object, as it would for a string, it just finds the old one in the table, and uses that--it gives you a pointer to the same object, instead of a pointer to a new one.

Try this:

Scheme>(define bar1 'bar)
#void
Scheme>(define bar2 'bar)
#void
Scheme>(eq? bar1 bar2)
#t

Here, when we typed in the first definition, Scheme created a symbol object with the character sequence b a r, and added it to its table of existing symbols, as well as putting a pointer to it in the new variable binding bar1. When we typed in the second definition, Scheme noticed that there was already a symbol object named bar, and put a pointer to that same object in bar2 as well.

When we asked Scheme if the values of bar1 and bar2 referred to the same object, the answer was yes (#t)---they both referred to the unique symbol bar; there is only one symbol by that name.

The big advantage of symbols over strings is that comparing them is very fast. If you want to know if two strings have the same character sequence, you can use equal?, which will compare their characters until it either finds a mismatch or reaches the ends of both strings.

With symbols, you can use equal?, but you can get the same results using eq?, which is faster. Recall that eq? just compares the pointers to two objects, to see if they're actually the same object. For symbols, this works to compare the print names, too, because two symbols can have the same name only if they're the same object. You don't have to worry about symbols being equal? but not eq?.

This makes symbols good for use as keys in data structures. For example, you can zip through a list looking for a symbol, using eq?, and all it has to do is compare pointers, not character sequences.

Another advantage of symbols is that only one copy of its character sequence is actually stored, and all occurrences of the same symbol are represented as pointers to the same object. Each additional occurrence of symbol thus only costs storage for a pointer.

If you're doing text processing in Scheme, e.g., writing a word processor, you probably want to use strings, not symbols. Strings support more operations that make it convenient to concatenate them, modify them, etc.

Symbols are mainly used as key values in data structures, which happen to have a convenient human-readable printed representation.

If you need to convert between strings and symbols, you can use string->symbol and symbol->string. string->symbol takes a string and returns the unique symbol with that print name, if there is one. (If there's not, and the string is a legal symbol print name, it creates one and returns it.) symbol->string takes a symbol and returns a string representing its print name. (There is no guarantee as to whether it always returns the same string object for a given symbol, or a copy with the same sequence of characters.)

Go to the first, previous, next, last section, table of contents.