A program examining such a tree can decide which child it wants to follow based on the information in the node, or may choose to follow all the children.
Tree nodes that have no children are called leaf nodes.
Tree nodes that are not leaves are called internal nodes.
The number of children a node has it is degree.
The "top" node in a tree, the one leading (eventually) to all of the other nodes, is called the root node.
The depth of a node x is the number of nodes between the root and x. The root is said to be at depth 0; any children of the root are at depth 1, etc. The height of a tree is the maximum depth of any node. For convenience, the height of an empty tree is defined as -1. All the nodes with the same depth d are in a set called level d.
Often, we will use familial relationships (e.g. silbling, grandchild, parent, etc.) to describe relationships among tree nodes with the obvious interpretations. Similarly, tree analogies will be used (e.g., a set of trees is a forest, a tree with lots of breadth is bushy, etc.).
In lecture 7, we saw an example of such a data structure: a parse tree, where the root represented an entire expression, internal nodes represented operations, and leaf nodes represented variable names. The tree was created by a recursive descent parser, which built the tree in a bottom up manner (i.e., from leaf nodes to the root), then evaluated it in a top down manner (i.e., from root to leaf).
Here's an example of an ordinary tree:
a Level 0 / \ / \ / \ / \ b c Level 1 / | \ | / | \ | d e f g Level 2 / \ | /|\ / \ h i j k l m n o Level 3 / \ | /|\ p q r x y z Level 4 / | \ / /|\ \ / | | | \ s t u v w Level 5
How can we represent trees? If we know the maximum degree (number of children) of any internal node, we can do something similar to a linked list, with an array of pointers to the next node, instead of just one, e.g.:
#define MAX_DEGREE 20 /* up to 20 children */ typedef struct _treenode { int k; /* information */ struct _treenode *children[MAX_DEGREE]; } treenode;and we can use NULL pointers to indicate no successor node.
If we don't know the maximum degree, then we can use a linked list of siblings, each of which has a link to a linked list of children:
typedef struct _treenode { int k; /* information */ struct _treenode *children, /* list of children */ *next_sibling; /* list of siblings */ } treenode;So next_sibling points to the next node in a list of siblings at the same level, and children points to a list (connected by that structures next_sibling field) of children. (It turns out that this kind of node is a very useful general list processing structure; see the computer language LISP for more information.)
For example, we can represent the tree:
1 / | \ 2 3 4 / \ 5 6as
_____ |1| |/| _|___ _____ _____ |2|/|-|-|3|/|-|-|4| |/| _|___ _____ |5|/|-|-|6|/|/|How do we insert into such trees? Well, it all depends on the nature of the values of keys and how they are related to the concept of successors to a node. That is, until we know what a key means and why we want to stick it into a tree, it doesn't make sense to talk about how to insert it into a tree. An easy example of insertion is binary search trees.
1 / \ 2 3 / / \ 4 5 6 \ 7What is the height of this tree? What are the leaves? The internal nodes?
A special case of a binary tree is a complete binary tree. A complete binary tree is one in which:
For example, this is a complete binary tree:
1 / \ / \ 2 3 / \ / \ 4 5 6 7Complete binary trees are important because they allow us to explore certain aspects of binary trees in a simple context.
How many leaf nodes are there in a complete binary tree of height
h? Notice that, whatever the answer is at height h,
it is twice the value for height h-1. We can look at this as a
recursive function:
f(h) =
1, if h = 0 (i.e., just the root node by itself), orThis adds up to exponentiation, i.e., f(h) is just 2h. In class, we'll use this fact to figure out how many total nodes there are in a tree of height h. It turns out to be 2h+1-1.
2 f(h) otherwise.
What is the maximum height of a binary tree with n nodes? If the tree is stretched out like a linked list, e.g.
1 / 2 / 3 / ...then the maximum height is just n-1. It will turn out later that we don't like trees like this; we prefer short trees.
A more interesting question is, what is the minimum height of a binary tree with n nodes? To answer this question, we must determine what configuration of nodes will yield the minimum height. The tree must be almost complete, that is, a complete binary tree with possibly some of its leaves missing. An almost complete binary tree can have any number of nodes in it (proof of that left as an exercise :-), so we can consider this kind of tree without loss of generality.
The height of an almost complete binary tree with height h is the same as the height of a complete binary tree with height h, and one more than that of a complete binary tree with height h-1.
How many nodes in a complete binary tree of height h-1? Exactly 2h-1. So an almost complete binary tree of height h must have between 2h, e.g.:
1 / \ h = 2, n = 22 = 4 2 3 / 4and 2h+1-1 nodes, e.g.:
1 / \ h = 2, n = 22+1-1 = 7 2 3 / \ / \ 4 5 6 7That is, n must be equal to something between 2h and 2h+1-1; any almost complete binary tree with n in these limits will have height h. So a binary tree with n nodes has a height of h = floor (log2 n) (floor(n) means the closest integer less than or equal to n).
How does this help us? Suppose we have important information in
a data structure, and we want to find it. We know it is along a path
from the beginning to a terminal node (leaf or end of list).
If we're talking about a linked list, going from
the head to the end of the list takes
(n) time.
In a binary tree, going from the root to leaf takes
(log n). Since log n
is much less than n (e.g., log2 1,000,000,000 =
about 30), we can exploit this property to create efficient searchable
data structures.
Then a binary search tree is a binary tree that is:
10 / \ 5 12 / \ \ 1 6 13We'll think of other examples in class.
Searching in a binary search tree is simply binary search. If what you're looking for isn't in the current node, and it is less than the current node, look in the left subtree. Otherwise, look in the right subtree. If you get to a point where there's nowhere left to go, the item isn't in the tree.
Let's look at an implementation of binary search trees, with integer keys and the natural "less than or equal to" total ordering. The tree node structure has space for the key and pointers to the left and right subtrees, just like a normal binary tree. The nodes are unfortunately, but intuitively called bstreenodes:
typedef struct _bstreenode { int k; /* the key */ struct _bstreenode *left, /* left subtree */ *right; /* right subtree */ } bstreenode, *bstree;
int *search_bstree (bstree *t, int k) { while (t) { if (t->k == k) return &(t->k); /* found it */ if (k <= t->k) t = t->left; /* go left */ else t = t->right; /* go right */ } return NULL; }When the function reaches an empty subtree that should include k, it returns NULL. If it ever finds an internal node or leaf containing k, it returns a pointer to the k field there.
How could we write this recursively?
void insert_bstree (bstree *t, int k) { bstree p; p = (bstree) malloc (sizeof (bstreenode)); p->left = NULL; p->right = NULL; p->k = k; while (*t) { if (k <=lt;= (*t)->k) t = &(*t)->left; else t = &(*t)->right; } *t = p; }How can we do this recursively? Is it any easier?
void traverse_tree (bstree t) { if (!t) return; traverse_tree (t->left); printf ("%d\n", t->k); traverse_tree (t->right); }That was too easy. Now let's see a nonrecursive version of that function. It will assume some implementation of stacks of tree node pointers:
void traverse_nr (bstree t) { do { while (t) { push (t); t = t->left; } if (!empty_stack()) { t = pop (); printf ("%d\n", t->k); t = t->right; } } while (t || !empty_stack()); }Which one do you like better? The stack-based one can be more efficient, but hard to read and maintain. So, unless you want to squeeze every ounce of computing power out of the machine, I suggest you stick with the recursive version.
There are other kinds of traversals for trees. For instance, suppose we want to delete all the nodes in a tree, to free up storage when we're done. We can try to use an inorder traversal:
void delete_tree (bstree t) { if (!t) return; traverse_tree (t->left); free (t); traverse_tree (t->right); }but what is wrong with this? The free is done before we refer to the right subtree; this will probably result in a segmentation fault. If we reverse the order of the last two statements, the problem is taken care of. We are no longer visiting the nodes in order, but who cares, since we're deleting them anyway? This is called doing a postorder traversal. And a preorder traversal is where the node is visited before the left and right subtrees.
#include <stdio.h> #include <stdlib.h> #include <string.h> typedef struct _gbstreenode { void *k; /* pointer to some stuff, * we don't know what. the user has * to provide storage here. */ struct _gbstreenode *left, /* left subtree */ *right; /* right subtree */ } gbstreenode, *gbstree; /* search a general tree */ void *search_gbstree (gbstree t, void *k, int (*compar)(void *, void *)) { /* compar is a function of two pointers a and b, returning * < 0 if *a < *b, 0 if *a == *b, and > 0 if *a > *b */ int c; while (t) { c = compar (t->k, k); if (c == 0) return &(t->k); /* found it */ if (c < 0) t = t->left; /* go left */ else t = t->right; /* go right */ } return NULL; } /* insert a node */ void insert_gbstree (gbstree *t, void *k, int (*compar)(void *, void *)) { gbstree p; int c; p = (gbstree) malloc (sizeof (gbstreenode)); p->left = NULL; p->right = NULL; p->k = k; while (*t) { c = compar (k, (*t)->k); if (c < 0) t = &(*t)->left; else t = &(*t)->right; } *t = p; } /* traverse a tree */ void gbsinorder_traverse_tree (gbstree t, void (*visit)(void *)) { /* visit is a function accepting a pointer, doing * something (we don't care what) to the data it points to */ int c; if (!t) return; gbsinorder_traverse_tree (t->left, visit); visit (t->k); gbsinorder_traverse_tree (t->right, visit); } void gbspostorder_traverse_tree (gbstree t, void (*visit)(void *)) { /* visit is a function accepting a pointer, doing * something (we don't care what) to the data it points to */ int c; if (!t) return; gbspostorder_traverse_tree (t->left, visit); gbspostorder_traverse_tree (t->right, visit); visit (t->k); } /* the 'visit' function for a tree of strings. just prints out the string */ void print_string (char *s) { fputs (s, stdout); } /* strcmp(), already in the standard C library, serves for compar */ /* this program reads in strings, then prints them out in sorted order */ int main () { char s[100], *p; gbstree t; /* empty tree */ t = NULL; /* loop until end of file */ for (;;) { /* get a string */ fgets (s, 100, stdin); if (feof (stdin)) break; /* strdup() uses malloc to duplicate the string, * so it will have its own storage in the tree */ p = strdup (s); /* insert, casting strcmp atrociously */ insert_gbstree (&t, p, (int (*)(void*,void*))strcmp); } /* traverse tree, casting print_string */ gbsinorder_traverse_tree (t, (void(*)(void*))print_string); /* traverse in postorder, freeing each node */ gbspostorder_traverse_tree (t, (void(*)(void*))free); exit (0); }