Content
Basics of Container ADTs Set Operations static Data Members size_t Possible Implementations Operator Overloading Sorted Representation Time Complexity Function Templates <- Go BackUniversity of Michigan at Ann Arbor
Last Edit Date: 03/25/2023
Disclaimer and Term of Use:
We do not guarantee the accuracy and completeness of the summary content. Some of the course material may not be included, and some of the content in the summary may not be correct. You should use this file properly and legally. We are not responsible for any results from using this file
This personal note is adapted from Professor Amir Kamil, Andrew DeOrio, James Juett, Sofia Saleem, and Saquib Razak. Please contact us to delete this file if you think your rights have been violated.
This work is licensed under a Creative Commons Attribution 4.0 International License.
A container is an abstract data type whose purpose is to hold objects of some other type. We have already seen two examples of containers: built-in arrays and vectors from the standard library. Both are containers that hold elements in a particular order, indexed starting from 0. A built-in array has a fixed size, while a vector can grow and shrink.
Let’s define our own container ADT. Rather than an ordered sequence, we will build a set abstraction, which represents an unordered collection of unique elements. We will start with a container that holds integers, generalizing it to arbitrary element types next time.
In designing our ADT, we will use the following process:
Determine which operations our ADT will support.
Write some code that uses our ADT. This will helps us figure out what the right interface should be, and it will serve as a test case. (Ideally, we would also write unit tests for each part of the public interface, following the pattern of test-driven development.)
Define the ADT’s public interface, including both function signatures and documentation.
Come up with a data representation. Define member variables and determine the representation invariants.
Write definitions for the ADT’s constructors and member functions, making sure that they adhere to the representation invariants.
The following are the operations our set will support:
Creating an empty set.
Inserting a value into a set. We will allow an existing item to to be inserted - it will just do nothing.
Removing a value from a set. We will allow a nonexistent item to to be removed - it will just do nothing.
Check if a value if contained in a set.
Count the number of items in a set.
Print a character representation of a set to an output stream.
The following is a public interface of set and an example code that uses set operations:
1 class IntSet {
2 public:
3 // Maximum size of a set.
4 static const int MAX_SIZE = 10;
5
6 // EFFECTS: Initializes this set to be empty.
7 IntSet();
8
9 // REQUIRES: size() < MAX_SIZE
10 // MODIFIES: *this
11 // EFFECTS: Adds value to the set, if it isn't already in the set.
12 void insert(int value);
13
14 // MODIFIES: *this
15 // EFFECTS: Removes value from the set, if it is in the set.
16 void remove(int value);
17
18 // EFFECTS: Returns whether value is in the set.
19 bool contains(int value) const;
20
21 // EFFECTS: Returns the number of elements.
22 int size() const;
23
24 // EFFECTS: Prints out the set in an arbitrary order.
25 void print(std::ostream &os) const;
26 };
27
28 int main() {
29 IntSet set;
30 set.insert(7);
31 set.insert(32);
32 set.insert(32);
33
34 cout << "Size: " << set.size() << endl; // prints 2
35 set.print(cout); // prints { 7, 32 } in some arbitrary order
36
37 set.insert(42);
38 set.remove(32);
39 set.remove(32);
40
41 cout << "Contains 32? " << set.contains(32) << endl; // prints 0 (false)
42 cout << "Contains 42? " << set.contains(42) << endl; // prints 1 (true)
43 }
The public interface includes a default constructor and member functions to insert or remove an item, check if an item is in the set, obtain its size, and print a representation of the set to a stream.
The latter three functions do not modify the set, so they are declared as const
, meaning that the this
pointer will be a pointer to const. The compiler will then enforce that those functions do not modify any member variables.
static
Data Members¶Note that the public interface has a constant, MAX_SIZE
, that denotes the maximum size of a set. We define the constant inside the InSet
class. This distinguishes the constant from other constants of the same name (there may have some other ADTs with const
).
If a constant is declared as static
, which has a specific meaning when used in a class - it indicates that the given member is not associated with an object of the class, but with the class as a whole.
For a member variable, this means that there is a single copy of that variable in the program, located in static storage.
A static member can be accessed by name from within the class, the same as a non-static member. It can be accessed from outside the class with the scope-resolution operator:
1 int main() {
2 IntSet set;
3 for (int i = 0; i < IntSet::MAX_SIZE; ++i) {
4 set.insert(i);
5 }
6 }
We need a compile-time constant because we will use a member variable that is a built-in array as part of our data representation.
In C++, a local or member variable that is an array must have a size that is known to the compiler. In the case of a local variable, the compiler needs to know how big it is so that the program can create activation records of the correct size.
For a member variable, the compiler needs to know its size so that it can determine the size of the object as a whole. Pointer arithmetic (including array accesses) requires the compiler to know the size of an object at compile time, since it is in terms of whole objects rather than with respect to individual bytes.
size_t
¶The size_t
type represents only nonnegative integers, and the C++ standard guarantees that it is large enough to hold the size of any objects.
C++ has integral types that are unsigned, which do not represent negatie numbers, and size_t
is just one example.
The int
type is signed (unless it is produced by the unsigned
keyword).
Unsigned integers can lead to subtle programming errors. The following is an example that compares a signed and unsigned integer:
1 int main() {
2 size_t s = 3;
3 int i = -1;
4 cout << (s < i) << endl; // prints 1 (true)
5 }
In the code above, the compiler converts i
to a size_t
in doing the comparaison, and the data that represents a negative signed integer actually represents a very large unsigned integer.
Try it out:
To traverse using size_t
, we should always make sure it is greater than 0, otherwise the compiler will think it is a very large unsigned integer. The following is an example that shows a correct traversal using size_t
.
Try it out:
We, C++ programmers, usually use signed integers because of the pitfalls of unsigned integers.
Suppose we have the following class, which simulates the set objects.
1 static const int ELTS_CAPACITY = 10;
2 class IntSet {
3 private:
4 int elts[10];
5 int elts_size;
6
7 // EFFECTS: Returns the index of the v in the elts
8 // array. If not present, returns -1.
9 int indexOf(int v) const {
10 for(int i = 0; i < elts_size; ++i){
11 if(elts[i] == v){
12 return i;
13 }
14 }
15 return -1;
16 }
17
18 public:
19
20 // IntSet constructor - creates an empty set.
21 IntSet() : elts_size(0) { }
22
23 // EFFECTS: returns whether v is in the set
24 bool contains(int v) const {
25 return indexOf(v) != -1;
26 }
27
28 // REQUIRES: size() < ELTS_CAPACITY
29 // EFFECTS: adds v to the set if not already present
30 void insert(int v) {
31 if (!contains(v)) {
32 elts[elts_size++] = v;
33 }
34 }
35
36 // EFFECTS: removes v from the set
37 void remove(int v) {
38 int i = indexOf(v);
39 if (i != -1) {
40 elts[i] = elts[elts_size - 1];
41 elts_size--;
42 }
43 }
44
45 // EFFECTS: returns the number of elements
46 int size() const {
47 return elts_size;
48 }
49
50 // EFFECTS: prints out the set
51 void print(ostream &os) const {
52 os << "{" << " ";
53 if (elts_size > 0) {
54 os << elts[0];
55 }
56 for(int i = 1; i < elts_size; ++i) {
57 os << ", " << elts[i];
58 }
59 os << " }" << endl;
60 }
61 };
We will overload a <<
operator for ourselves to print the set much easier.
1 ostream &operator<<(ostream &os, const IntSet &s) {
2 s.print(os);
3 return os;
4 }
The following code has some test cases.
Try it out:
C++ follows the philosophy that user-defined types should have the same access to language facilities as built-in types. Since operators can be used with built-in atomic types, C++ allows operators to be applied to class types through operator overloading.
Most operators in C++ can be overloaded. An operator overload requires at least one of the operands to be of class type – the behavior of operators on atomic types cannot be changed.
The following is a member function that defines the +
operation to compute the union of two IntSets
:
1 IntSet IntSet::operator+(const IntSet &rhs) const {
2 IntSet result = *this;
3 for (int i = 0; i < rhs.num_elements; ++i) {
4 result.insert(rhs.elements[i]);
5 }
6 return result;
7 }
We can use the overloaded operator as following:
1 int main() {
2 IntSet set1;
3 set1.insert(32);
4 set1.insert(42);
5 set1.insert(7);
6
7 IntSet set2;
8 set2.insert(12);
9 set2.insert(-3);
10 set2.insert(42);
11
12 IntSet set3 = set1.operator+(set2);
13 set3.print(cout); // prints { 32, 42, 7, 12, -3 }
14
15 IntSet set4 = set1 + set2;
16 set4.print(cout); // prints { 32, 42, 7, 12, -3 }
17 }
Though most operators can be overloaded either as top-level or member functions, there are some cases where we must use a top-level function:
The first operand is of atomic type. Atomic types are not classes, so they do not have member functions.
The first operand is of class type, but we do not have access to the class definition, so we cannot define a new member function.
An example of the latter is overloading the stream-insertion operator, where we do not have access to the definition of ostream.
In other cases, we need to define an operator as a member function:
If the overload needs access to private members, a member function would have access because it is part of the class.
Some operators can only be overloaded as member functions: the assignment operator (=
), the function-call operator (()
), the subscript operator ([]
), and the arrow operator (->
).
For example:
1 bool IntSet::operator[](int value) const {
2 return contains(value);
3 }
The following is an example of applying the operator:
1 int main() {
2 IntSet set;
3 set.insert(32);
4 cout << set[32] << endl; // prints 1 (true)
5 cout << set[42] << endl; // prints 0 (false)
6 }
The representation of IntSet places no restrictions on the ordering of elements in the array. This simplifies the implementation of insert()
and remove()
, but it requires contains()
(and indexOf()
) to iterate over every element in the worst case.
An alternate representation would be to require that the set elements are stored in sorted, increasing order. The member variables remain the same – it is the representation invariants that change.
The following is the implementation of a sorted set:
1 // Maximum capacity of a set.
2 const int ELTS_CAPACITY = 10;
3 class SortedIntSet {
4 public:
5
6 // SortedIntSet constructor - creates an empty set.
7 SortedIntSet() : elts_size(0) { }
8
9 // EFFECTS: returns whether v is in the set
10 bool contains(int v) const {
11 return indexOf(v) != -1;
12 }
13
14 // REQUIRES: size() < ELTS_CAPACITY
15 // EFFECTS: adds v to the set if not already present
16 void insert(int v) {
17 if (!contains(v)){
18 int i = elts_size;
19 for (; i > 0 && elts[i - 1] > v; i--){
20 elts[i] = elts[i - 1];
21 }
22 elts[i] = v;
23 elts_size++;
24 }
25 }
26
27 // EFFECTS: removes v from the set
28 void remove(int v) {
29 int i = indexOf(v);
30 if (i == -1) {
31 return;
32 }
33 for (; i < elts_size - 1; ++i) {
34 elts[i] = elts[i + 1];
35 }
36 --elts_size;
37 }
38
39 // EFFECTS: returns the number of elements
40 int size() const {
41 return elts_size;
42 }
43
44 // EFFECTS: prints out the set
45 void print(ostream &os) const {
46 os << "{" << " ";
47 if (elts_size > 0) {
48 os << elts[0];
49 }
50 for(int i = 1; i < elts_size; ++i) {
51 os << ", " << elts[i];
52 }
53 os << " }" << endl;
54 }
55
56 private:
57
58 // INVARIANT - the array is maintained in ascending sorted order
59 int elts[10];
60 int elts_size;
61
62 // EFFECTS: Returns the index of the v in the elts
63 // array. If not present, returns -1.
64 int indexOf(int v) const {
65 return binarySearch(v, 0, elts_size);
66 }
67
68 // Helper function to perform binary search for indexOf
69 int binarySearch(int v, int start, int end) const {
70 while (start < end) {
71 int middle = start + (end - start) / 2;
72 if (v == elts[middle]) {
73 return middle;
74 }
75 else if (v < elts[middle]) {
76 end = middle;
77 }
78 else {
79 start = middle + 1;
80 }
81 }
82 return -1;
83 }
84
85 };
86
87 ostream &operator<<(ostream &os, const SortedIntSet &s) {
88 s.print(os);
89 return os;
90 }
The following code has some test cases.
Try it out:
Operation | Unsorted Set | Sorted Set |
---|---|---|
insert() |
$O(n)$ | $O(n)$ |
remove() |
$O(n)$ | $O(n)$ |
contains() |
$O(n)$ | $O(\log n)$ |
size() |
$O(1)$ | $O(1)$ |
constructor() |
$O(1)$ | $O(1)$ |
We can define a function as a template, resulting in a function template. For example, the following computes the maximum of two items of the same type:
1 template <typename T>
2 T max(const T &value1, const T &value2) {
3 return value2 > value1 ? value2 : value1;
4 }
Templates allow us to implement a generic set container with a flexible element type. The following is the interface of an unsorted set with template.
1 template <typename T>
2 class UnsortedSet {
3 public:
4 // Maximum size of a set.
5 static const int MAX_SIZE = 10;
6
7 // EFFECTS: Initializes this set to be empty.
8 UnsortedSet();
9
10 // REQUIRES: size() < MAX_SIZE
11 // MODIFIES: *this
12 // EFFECTS: Adds value to the set, if it isn't already in the set.
13 void insert(const T &value);
14
15 // MODIFIES: *this
16 // EFFECTS: Removes value from the set, if it is in the set.
17 void remove(const T &value);
18
19 // EFFECTS: Returns whether value is in the set.
20 bool contains(const T &value) const;
21
22 // EFFECTS: Returns the number of elements.
23 int size() const;
24
25 // EFFECTS: Prints out the set in an arbitrary order.
26 void print(std::ostream &os) const;
27
28 private:
29 T elements[MAX_SIZE];
30 int num_elements;
31 // INVARIANTS:
32 // 0 <= num_elements && num_elements <= MAX_SIZE
33 // the first num_elements items in elements are the items in the set
34 // the first num_elements items in elements contain no duplicates
35 };
Compiling Templates
For a better organization, we can separate the definetions into their own file; common convention is to use a suffix lilke .tpp
for this file. We can then use a #include
directive to pull the code into the header file.
The following code should be in max.h
:
1 template <typename T>
2 T max(const T &value1, const T &value2);
3
4 #include "max.tpp"
The following is the implementation of the function max()
, which is in the max.tpp
:
1 template <typename T>
2 T max(const T &value1, const T &value2) {
3 return value2 > value1 ? value2 : value1;
4 }
Code that uses the max module would then just #include "max.h"
, which would transitively include max.tpp
.
Include Guards
To avoid problems with a header being included more than once, headers generally have include guards that arrange for the compiler to ignore the code if the header is included a second time. The following is an example of include guards in max.h
:
1 #ifndef MAX_H
2 #define MAX_H
3
4 template <typename T>
5 T max(const T &value1, const T &value2);
6
7 #include "max.tpp"
8 #endif /* MAX_H */
The #ifndef
and #define
directives are the opening of the include guard, and the #endif
closes it. The code within is ignored if the header is included again.