Content
Recursive Lists Tree Tree Traversals Binary Search Tree (BTS) The BinarySearchTree Interface BST-Based Set <- Go BackUniversity of Michigan at Ann Arbor
Last Edit Date: 04/09/2023
Disclaimer and Term of Use:
We do not guarantee the accuracy and completeness of the summary content. Some of the course material may not be included, and some of the content in the summary may not be correct. You should use this file properly and legally. We are not responsible for any results from using this file
This personal note is adapted from Professor Amir Kamil, Andrew DeOrio, James Juett, Sofia Saleem, and Saquib Razak. Please contact us to delete this file if you think your rights have been violated.
This work is licensed under a Creative Commons Attribution 4.0 International License.
The data representation of a linked list is an example of structural recursion: the Node
type uses itself in its own representation:
1 struct Node {
2 int datum;
3 Node *next;
4 };
This representation satisfies the requirements for a recursive abstraction:
The empty list is the base case, represented by a null pointer.
A non-empty list is a recursive case; it can be subdivided into the first node and the rest of the list, which represents a list in its own right.
Independent of its representation, a linked list can actually be defined recursively as either an empty list, or a datum followed by a smaller list.
Given the recursive definition of a list, it is natural to process a list with a recursive function. The base case of the recursive function will be the minimum-size list allowed by the function, and larger lists will be handled in the recursive case. As an example, the following is a recursive function to compute the length of a list:
1 // REQUIRES: node represents a valid list
2 // EFFECTS: Computes the length of the list that starts at the
3 // given node.
4 int length(const Node *list) {
5 if (list == nullptr) { // empty list
6 return 0;
7 } else { // non-empty list
8 return 1 + length(list->next); // list->next is a smaller list
9 }
10 }
The length of an empty list is 0. The length of a non-empty list is one more than the length of the rest of the list; the elements in the list consist of the initial datum and the elements in the rest of the list. We use the length()
function itself as an abstraction to compute the number of elements in the remainder of the list, taking the recursive leap of faith that it will compute the right answer.
Another example is to find the maximum element in a list. Unlike for length()
, the minimal list is not an empty one, since an empty list has no elements in it. Instead, the minimum required size is a list with a single datum, in which case the maximum element is just that line datum, constituting our base case. For a large list, we can break it down recursively as follows:
Find the maximum element in the rest of the list. This element is at least as large as any other element in the rest of the list.
Compare the first element to the max of the remainder. The larger of the two is transitively at least as large as the elements in the rest of the list.
The following implements this algorithm:
1 // REQUIRES: node represents a valid, non-empty list
2 // EFFECTS: Returns the maximum element in the list that starts at
3 // the given node.
4 int list_max(const Node *list) {
5 if (list->next == nullptr) { // list has only one element
6 return list->datum;
7 } else { // list has more than one element
8 return std::max(list->datum, // compare first datum to
9 list_max(list->next)); // max of rest of lest
10 }
11 }
The base case is a list with one element. Such a list has an empty next
list, so we check for that and return the list's lone datum.
The recursive case computes the max of the rest of the list and then uses std::max()
to determine the maximum of that item and the first item in the list.
As always, we take the recursive leap of faith, assuming that the recursive call to list_max()
computes the right answer for the smaller list.
A tree is a (possibly non-linear) data structure made up of nodes or verticwa and edges with only one pathway from the root node to a given node.
The tree with no nodes is called the null or empty tree. A tree that is not empty consists of a root node and potintially many levels of additional nodes that from a hierarchy.
Binary tree is a tree that is either empty (or null) or each node has a maximum of two children, left subtree and a right subtree.
The term tree stems from the fact that its branching structure resembles that of a botanical tree. Terminology with respect to tree data structures borrows from both botanical and family trees.
The root is the node that originates the branching structures. In our diagrams, the root is pictured as the top of the tree.
A non-empty tree consists of a parent node and two child nodes. For a binary tree, there is a left child and a right child. Nodes that have the same parent are siblings.
A node whose children are all empty is a leaf.
The size of a tree is the number of elements it contains.
The height of a tree is the number of levels at which it has elements. Equivalently, it is the length of the longest path from the root to a leaf node.
Algorithms on trees are often written as tree-recursive functions, so that the recursive case makes more than one recursive call. The general strategy is to directly compute that result for the smallest tree allowed by the function, constituting the base case. The recursive case makes recursive calls to compute the answer for the whole tree.
As an example, the following algorithm computes the size of a tree:
The size of an empty tree is zero
The size of a non-empty tree is the size of the left child, plus the size of the right child, plus one for the root datum.
We use a Node
struct, which stores three member objects (datum
, left
, and right
) to achieve this:
1 struct Node {
2 int datum;
3 Node *left;
4 Node *right;
5 };
Note: We use a null pointer to represent an empty tree.
Function size()
1 // REQUIRES: node represents a valid tree
2 // EFFECTS: Returns the number of elements in the tree represented
3 // by the given node.
4 int size(const Node *tree) {
5 if (!tree) { // empty tree
6 return 0;
7 } else { // non-empty tree
8 return 1 + size(tree->left) + size(tree->right);
9 }
10 }
Try it out:
Function height()
The height of a tree is the number of levels it contains. An empty tree contains no levels, so its height is zero. For a non-empty tree, we exploit the alternate definition of height, that it is the length of the longest path from root to leaf. The longest such patch is just one node longer than the longest path in the child subtrees, since the root adds one additional node to the part of the path contained in a child. Thus, we compute the height as follows:
1 // REQUIRES: node represents a valid tree
2 // EFFECTS: Returns the height of the tree represented by the given
3 // node.
4 int height(const Node *tree) {
5 if (!tree) { // empty tree
6 return 0;
7 } else { // non-empty tree
8 return 1 + std::max(height(tree->left),
9 height(tree->right));
10 }
11 }
Note: We use std::max()
to obtain the longer path from the two child trees, then add one to the result to account for the root node
Try it out:
1. Preorder Traversal: Visit the node; traverse the left subtree; traverse the right subtree.
Code | Diagram | Output |
---|---|---|
1 void print(const Node *tree) { 2 if (tree) { 3 cout << tree->datum < ", "; 4 print(tree->left); 5 print(tree->right); 6 } 7 } |
|
6, 4, 9, 3, 2, 1, 10 |
2. Inorder Traversal: Traverse the left subtree; visit the node; traverse the right subtree.
Code | Diagram | Output |
---|---|---|
1 void print(const Node *tree) { 2 if (tree) { 3 print(tree->left); 4 cout << tree->datum < ", "; 5 print(tree->right); 6 } 7 } |
|
9, 4, 3, 6, 1, 2, 10 |
3. Postorder Traversal: Traverse the left subtree; traverse the right subtree; visit the node.
Code | Diagram | Output |
---|---|---|
1 void print(const Node *tree) { 2 if (tree) { 3 print(tree->left); 4 print(tree->right); 5 cout << tree->datum < ", "; 6 } 7 } |
|
9, 3, 4, 1, 10, 2, 6 |
4. Level-order Traversal (optional)
Diagram | Output |
---|---|
|
6, 4, 2, 9, 3, 1, 10 |
A binary search tree (BST) is a binary tree whose elements are stored in an order that maintains a sorting invariant. Specifically, a binary tree is either:
empty, or
a root datum with two subtree such that
The two subtrees are themselves binary search trees
Every element in the left subtree is strictly less than the root datum
Every element in the right subtree is strictly greater than the root datum
Every element in the left subtree is less that the root datum, and every element in the right subtree is greater than the root. The left and right subtrees meet the requirements for a binary search tree. Thus, the whole tree is a binary search tree.
In the tree on the left, the left subtree contains the element 7, which is not smaller than the root element 5. This violates the second condition in the recursive definition above of a BST.
In the tree on the right, the right subtree is empty, which is a valid binary search tree. However, the left subtree is not a valid BST, since it does not meet the sorting invariant for a BST. Thus, the tree on the right is not a binary search tree.
A binary search tree is thus named because searching for an element can be done efficiently, in time proportional to the height of the tree rather than the size. A search algorithm need only recurse on one side of the tree at each level. For example, in searching for the element 2 in the BST in Figure 84, the element must be in the left subtree, since 2 is less than the root element 6. Within the left subtree, it must again be to the left, since 2 is less than 5. Within the next subtree, the 2 must be to the right of the 1, leading us to the actual location of the 2.
More generally, the following algorithm determines whether or not a BST contains a particular value:
If the tree is empty, it does not contain the value.
Otherwise, if the root datum is equal to the value, the tree contains the element.
If the value is less than the root element, it must be in the left subtree if it is in the tree, so we repeat the search on the left subtree.
Otherwise, the value is greater than the root element, so it is in the right subtree if it is in the tree at all. Thus, we repeat the search on the right subtree.
The first two cases above are base cases, since they directly compute an answer. The latter two are recursive cases, and we take the recursive leap of faith that the recursive calls will compute the correct result on the subtrees.
The algorithm leads to the following implementation on our tree representation:
1 // REQUIRES: node represents a valid binary search tree
2 // EFFECTS: Returns whether or not the given value is in the tree
3 // represented by node.
4 bool contains(const Node *node, int value) {
5 if (!node) { // empty tree
6 return false;
7 } else if (node->datum == value) { // non-empty tree, equal to root
8 return true;
9 } else if (value < node->datum) { // less than root
10 return contains(node->left, value);
11 } else { // greater than root
12 return contains(node->right, value);
13 }
14 }
This implementation is linear recursive, since at most one recursive call is made by each invocation of contains()
. Furthermore, every recursive call is a tail call, so the implementation is tail recursive.
This example illustrates that the body of a linear- or tail-recursive function may contain more than one recursive call, as long as at most one of those calls is actually made.
Let us consider another algorithm, that of finding the maximum element of a BST, which requires there to be at least one element in the tree.
If there is only one element, then the lone, root element is the maximum. The root is also the maximum element when the right subtree is empty; everything in the left subtree is smaller than the root, making the root the largest item.
On the other hand, when the right tree is not empty, every element in that subtree is larger than the root and everything in the left subtree. Then the largest element in the whole tree is the same as the largest element in the right subtree.
We have the following algorithm:
If the right subtree is empty, then the root is the maximum
Otherwise, the maximum item is the maximum element in the right subtree
We implement the algorithm as follows:
1 // REQUIRES: node represents a valid non-empty binary search tree
2 // EFFECTS: Returns the maximum element in the tree represented by
3 // node.
4 int tree_max(const Node *node) {
5 if (!node->right) { // base case
6 return node->datum;
7 } else { // recursive case
8 return tree_max(node->right);
9 }
10 }
As with contains()
, tree_max()
is tail recursive, and it runs in time proportional to the height of the tree.
BinarySearchTree
Interface¶In the full linked-list definition we saw in Linked Lists, we defined a separate IntList
class to act as the interface for the list, defining Node
as a member of that class. We then generalized the element type, resulting in a List
class template. We follow the same strategy here by defining a BinarySearchTree class template as the interface for a BST.
1 template <typename T>
2 class BinarySearchTree {
3 public:
4 // EFFECTS: Constructs an empty BST.
5 BinarySearchTree();
6
7 // EFFECTS: Constructs a copy of the given tree.
8 BinarySearchTree(const BinarySearchTree &other);
9
10 // EFFECTS: Replaces the contents of this tree with a copy of the
11 // given tree.
12 BinarySearchTree & operator=(const BinarySearchTree &other);
13
14 // EFFECTS: Destructs this tree.
15 ~BinarySearchTree();
16
17 // EFFECTS: Returns whether this tree is empty.
18 bool empty() const;
19
20 // EFFECTS: Returns the number of elements in this tree.
21 int size() const;
22
23 // EFFECTS: Returns whether the given item is contained in this
24 // tree.
25 bool contains(const T &value) const;
26
27 // REQUIRES: value is not in this tree
28 // EFFECTS: Inserts the given item into this tree.
29 void insert(const T &value);
30
31 private:
32 // Represents a single node of a tree.
33 struct Node {
34 T datum;
35 Node *left;
36 Node *right;
37 // INVARIANTS: left and right are either null or pointers to
38 // valid Nodes
39 };
40
41 // The root node of this tree.
42 Node *root;
43 // INVARIANTS: root is either null or a pointer to a valid Node
44 };
As with a list, we define Node
as a nested class of BinarySearchTree
to encapsulate it within the latter ADT. Since it is an implementation detail and not part of the BST interface, we define Node
as a private member.
The contains()
member function differs from the one we defined before; the member function just takes in a data item, while our previous definition operates on a node as well. We define the latter as a private helper function and call it with the root node:
template <typename T>
class BinarySearchTree {
...
public:
bool contains(const T &value) const {
return contains_impl(root, value);
}
private:
bool contains_impl(const Node *node, const T &value) {
if (!node) {
return false;
} else if (node->datum == value) {
return true;
} else if (value < node->datum) {
return contains_impl(node->left, value);
} else {
return contains_impl(node->right, value);
}
}
Node *root;
};
Observe that contains_impl()
does not refer to a BinarySearchTree
or any of its members. Thus, there is no need for it to have a this
pointer to a BinarySearchTree
object. We can declare the function as a static member function to eliminate the this
pointer.
template <typename T>
class BinarySearchTree {
...
public:
bool contains(const T &value) const {
return contains_impl(root, value);
}
private:
static bool contains_impl(const Node *node, const T &value) {
if (!node) {
return false;
} else if (node->datum == value) {
return true;
} else if (value < node->datum) {
return contains_impl(node->left, value);
} else {
return contains_impl(node->right, value);
}
}
Node *root;
};
Just like a static member variable is associated with a class rather than an individual object, a static member function is also not associated with an individual object, and it cannot refer to non-static member variables.
A public static member function can be called from outside the class using the scope-resolution operator, the same syntax for referring to a static member variable:
BinarySearchTree<int>::contains_impl(nullptr, -1);
// compile error because contains_impl() is not public
Previously, we have seen array-based set implementations, one that used an unsorted array and another a sorted array. We can also implement a set using a binary search tree to store the data:
template <typename T>
class BSTSet {
public:
// EFFECTS: Inserts the given value into this set, if it is not
// already in the set.
void insert(const T &value) {
if (!elts.contains(value)) {
elts.insert(value);
}
}
// EFFECTS: Returns whether value is in this set.
bool contains(const T &value) const {
return elts.contains(value);
}
// EFFECTS: Returns the size of this set.
int size() const {
return elts.size();
}
private:
BinarySearchTree<T> elts;
};
If the underlying BST is balanced, meaning that each subtree within the BST has close to the same number of elements in its left and right child, then the height of the tree is in $O(\log n)$, where $n$ is the size of the tree. Thus, the contains()
and insert()
operations take logarithmic time, rather than the linear time they would take on an unsorted set.
Unfortunately, our BST implementation does not guarantee that it will be balanced. In fact, inserting items in increasing order results in a maximally unbalanced tree.