Symbols are like strings, in that they have a character sequence.
Symbols are different, however, in that only one symbol object
can have any given character sequence. The character sequence is called
the symbol's print name. A print name is not the
same thing as a variable name, however--it's just the character
sequence that identifies a particular unique symbol. It's called the
print name because that's what's printed out when you display
the object (or write
it).
Unlike strings, booleans, and numbers, symbols are not self-evaluating. To refer to a literal symbol, you have to quote it. Since print names of symbols look just like variable names, you have to tell Scheme which you mean.
If we type in the character sequence f
o
o
without
double quotes around it, Scheme assumes we mean to refer to a variable
named foo
, not the unique symbol whose print name is foo
.
In interpreters and compilers, symbol objects are often used as variable names, and Scheme treats them specially. If we just type in a character string that's a symbol print name, and hit return, Scheme assumes that we are asking for the value of the binding of the variable with that name--if there is one.
Scheme>(define foo 10) #void Scheme>foo 10
If we quote the symbol name with the single quote character, Scheme interprets
that as meaning we want the symbol object foo
.
Scheme>'foo foo
Since we've already defined (and bound) the variables foo1
and
foo2
, we can ask Scheme to look up their values.
Scheme>foo1 "foo" Scheme>foo2 "foo"
Here we've typed in the names that we gave to variables earlier, and Scheme looked up the values of the variables.
As we've seen before, this doesn't work if there isn't a bound variable by that name. Symbols can be used as variable names, if you define the variable, but by default a symbol is just an object with a particular print name that identifies it.
If we want to refer to the symbol object foo
, rather than
using foo as a variable name, we can quote it, using the special
quote character '
. This tells Scheme not to evaluate
the following expression, but to treat it as literal data.
Scheme> 'foo foo
When you type 'foo
, you're telling Scheme you want a pointer
to the symbol whose print name is foo
. It doesn't matter
whether there's a variable named foo
or what its current
value is---'foo
means a pointer to the unique symbol object whose
print name is foo
, which has nothing to do with any variable
foo
.
The first time you type in a symbol name, Scheme constructs a symbol object with that character sequence, and puts it in a special table. If you later type in a symbol name with the same character sequence, Scheme notices that it's the same sequence. Instead of constructing a new object, as it would for a string, it just finds the old one in the table, and uses that--it gives you a pointer to the same object, instead of a pointer to a new one.
Try this:
Scheme>(define bar1 'bar) #void Scheme>(define bar2 'bar) #void Scheme>(eq? bar1 bar2) #t
Here, when we typed in the first definition, Scheme created a symbol
object with the character sequence b
a
r
, and added
it to its table of existing symbols, as well as putting a pointer to
it in the new variable binding bar1
. When we typed in the
second definition, Scheme noticed that there was already a symbol object
named bar
, and put a pointer to that same object in bar2
as
well.
When we asked Scheme if the values of bar1
and bar2
referred
to the same object, the answer was yes (#t
)---they both referred to
the unique symbol bar
; there is only one symbol by that name.
The big advantage of symbols over strings is that comparing them is
very fast. If you want to know if two strings have the same
character sequence, you can use equal?
, which will compare
their characters until it either finds a mismatch or reaches the ends
of both strings.
With symbols, you can use equal?
, but you can get the same results
using eq?
, which is faster. Recall that eq?
just compares the
pointers to two objects, to see if they're actually the same object. For
symbols, this works to compare the print names, too, because two symbols
can have the same name only if they're the same object. You don't have
to worry about symbols being equal?
but not eq?
.
This makes symbols good for use as keys in data structures. For example,
you can zip through a list looking for a symbol, using eq?
, and
all it has to do is compare pointers, not character sequences.
Another advantage of symbols is that only one copy of its character sequence is actually stored, and all occurrences of the same symbol are represented as pointers to the same object. Each additional occurrence of symbol thus only costs storage for a pointer.
If you're doing text processing in Scheme, e.g., writing a word processor, you probably want to use strings, not symbols. Strings support more operations that make it convenient to concatenate them, modify them, etc.
Symbols are mainly used as key values in data structures, which happen to have a convenient human-readable printed representation.
If you need to convert between strings and symbols, you can use
string->symbol
and symbol->string
. string->symbol
takes a string and returns the unique symbol with that print name,
if there is one. (If there's not, and the string is a legal symbol
print name, it creates one and returns it.) symbol->string
takes a symbol and returns a string representing its print name.
(There is no guarantee as to whether it always returns the same string
object for a given symbol, or a copy with the same sequence of
characters.)