EECS 281: Data Structures and Algorithms

Section 08 Ordered and Sorted Ranges Algorithms and Data Structures to Represent Sets

University of Michigan at Ann Arbor

Last Edit Date: 06/07/2023

Disclaimer and Term of Use:

We do not guarantee the accuracy and completeness of the summary content. Some of the course material may not be included, and some of the content in the summary may not be correct. You should use this file properly and legally. We are not responsible for any results from using this file

This personal note is adapted from Professor Marcus Darden and David Paoletti. Please contact us to delete this file if you think your rights have been violated.

This work is licensed under a Creative Commons Attribution 4.0 International License.

Ordered and Sorted Containers¶

i. Container Review

Object storing a variable number of other objects
Allow for control / protection of data
Can copy / edit / sort / move many objects at once
Examples: vector, deque, stack, map, list, array

ii. Types of Containers

Type	Distinctive interfaces (not all methods listed)
Container	Supports `add()` and `remove()` operations
Searchable Container	Adds `find()` operation
Ordered Container	Sequential container which maintains current order. Can arbitrarily insert new elements anywhere. Example: Books in a shelf
Sorted Container	Sequential container with pre-defined order. Cannot arbitrarily insert elements. Example: Students sorted by ID

iii. Ordered Clarification

Ordered container elements maintain thier "relative" position unless they are removed
Example: If A comes before Q, and then Z is inserted between them, their relative ordering has not changed
- A still comes before Q
If Q then removed, the container is still ordered
- Before and after delete, A comes before Z

iv. Implemented Sorted and Ordered Containers

Two simplementation styles: connected, contiguous

Preferred implementation dependent upon requirements of application
- Know which operations will be called often
Study multiple implementations
- Know asymptotic complexity of each operation

v. Asymptototic Complexities

Ordered Container:

Operation	Array	Linked List
`addElement(val)`	$$O(1)$$	$$O(1)$$
`remove(val)`	$$O(n)$$	$$O(n)$$
`remove(iterator)`	$$O(n)$$	$$O(n)~\text{or}~O(1)$$
`find(val)`	$$O(n)$$	$$O(n)$$
`Iterator::operator*()`	$$O(1)$$	$$O(1)$$
`operator[](unsigned)`	$$O(1)$$	$$O(n)$$
`insertAfter(iterator, val)`	$$O(n)$$	$$O(1)$$
`insertBefore(iterator, val)`	$$O(n)$$	$$O(n)~\text{or}~O(1)$$

Sorted Container:

Operation	Array	Linked List
`addElement(val)`	$$O(n)$$	$$O(n)$$
`remove(val)`	$$O(n)$$	$$O(n)$$
`remove(iterator)`	$$O(n)$$	$$O(n)~\text{or}~O(1)$$
`find(val)`	$$O(\log n)$$	$$O(n)$$
`Iterator::operator*()`	$$O(1)$$	$$O(1)$$
`operator[](unsigned)`	$$O(1)$$	$$O(n)$$
`insertAfter(iterator, val)`	N/A	N/A
`insertBefore(iterator, val)`	N/A	N/A

Binary Search and Bounds¶

Note: Binary search requires the data to be sorted.

Sample code:

 1     int bsearch(double a[], double val, 
 2                 int left, int right) {
 3         while (right > left) {
 4             int mid = left + (right - left) / 2;
 5             if (val == a[mid])
 6                 return mid;
 7             if (val < a[mid])
 8                 right = mid;
 9             else
10                 left = mid + 1;
11         } // while
12         return -1; // val not found
13     } // bsearch()

Comparators (Function Objects)

Given elements a and b, tell if a < b.

 1     struct Point {
 2         int x, y;
 3         Point() : x(0), y(0) {}
 4         Point(int xx, int yy) : x(xx), y(yy) {};
 5     };

 1     struct CompareByX {
 2         bool operator() (const Point &a, const Point &b) const {
 3             return a.x > b.x;      // Dscending sort by X 
 4         }
 5     };

 1     struct CompareByY {
 2         bool operator() (const Point &a, const Point &b) const {
 3             return a.y > b.y;      // Ascending sort by Y
 4         }
 5     };

Speeding up Binary Search

Speed-up idea: == rarely triggers
- check if < first
- else if >
- else must be ==
More radical idea: move the == check out of the loop
- Find a sharp lower bound for the sought element first
- First item >= what I am looking for
- Check for the value == after the loop

2 Comparasions Half the Time:

 1     int bsearch(double a[], double val,
 2         int left, int right) {
 3         while (right > left) { // ONE
 4             int mid = left + (right - left) / 2;
 5
 6             if (a[mid] < val) // TWO: check < not ==
 7                 left = mid + 1;
 8             else if (val < a[mid]) // THREE
 9                 right = mid;
10             else // (val == a[mid])
11                 return mid;
12         } // while
13
14         return -1; // val not found
15     } // bsearch()

Always 2 Comparasions / Loop

 1     int lower_bound(double a[], double val,
 2         int left, int right) {
 3         while (right > left) { // ONE
 4             int mid = left + (right - left) / 2;
 5
 6             if (a[mid] < val) // TWO
 7                 left = mid + 1;
 8             else // (a[mid] >= val)
 9                 right = mid;
10         } // while
11
12         return left;
13     } // lower_bound()

Binary Search in STL

binary_search() returns a bool, not the location.

To find the locations (iterators), use

lower_bound() First item not less than target
upper_bound() First item greater than target
equal_range() Pair(lower_bound, upper_bound), all items equal to target

Try it out:

Open in new window

Representing Sets¶

i. Searchable Containers as Sets

A set is well-defined if you can tell if any given element is in the set.

Set Operations (SLT implements many of these):

Union ($\cup$): in one set or the other (OR)
Intersection ($\cap$): in both sets (AND)
Set Difference (\): in one and not the other (AND-NOT)
Symmetric Difference ($\div$): in only one (XOR)
addElement ($+$)
isElement ($\in$)
isEmpty

set_union() Example

Set 1 and Set 2 are sorted ranges. Set 3 is a union of Set 1 and Set 2.

set_union Example Code

 1     template<class ForIterator1, class ForIterator2, class OutIterator,
 2                    class Compare>
 3     OutIterator set_union(ForIterator1 first1, ForIterator1 last1,
 4                           ForIterator2 first2, ForIterator2 last2, 
 5                           OutIterator result, Compare compare) {
 6         while (first1 != last1 && first2 != last2) {
 7             if (compare(*first1, *first2))
 8                 *result++ = *first1++; // set1 element less than set2 element
 9             else if (compare(*first2, *first1)) 
10                 *result++ = *first2++; // set2 element less than set1 element
11             else {
12                 *result++ = *first1++; // set1 element == set2 element
13                 ++first2;
14             } // else
15         } // while
16         while (first1 != last1)
17             *result++ = *first1++; // Remaining elements
18         while (first2 != last2)
19             *result++ = *first2++; // Remaining elements
20         return result; // returns end of sorted union of set1 and set2
21     } // set_union()

Implementing [sub]sets with ranges

Method	Asymptotic Complexity
`initialize()`	$$O(1)\text{ or }O(n \log n)$$
`clear()`	$$O(1)\text{ or }O(n)$$
`isMember()`	$$O(\log n)$$
`copy()`	$$O(n)$$
`set_union()`	$$O(n)$$
`set_intersection()`	$$O(n)$$

Universe: set of all elements that may be in a set.

Union-Find¶

i. Representing Disjoint Sets

Sets are disjoint if they do not share any elements (i.e. an element may not only belong to one set)
Many applications require representing and operating on disjoint sets.

Example:

$A = \{1, 9, 3, 2, 10\},~B = \{23, 13, 25\},~C = \{14, 15, 27, 6, 16, 17\}$ are disjoint.

In this context, only two set operations make sense:
- Find: check if an element belongs to a particular set (or if two belong to the same set)
- Union: merge two sets together into a single set
Consequently, data structures for representing disjoint sets are often referred to as "union-field".

Example 1: Separate Containers

Time complexity: Union() - $O(n)$; Find() - $O(n)$

Union-Fild Data Structure

Idea 1: every disjiont set should have its unique representive (selected element)
- Every set element k must know its representative j
Therefore: to tell if j and m are in the same set, compare their representatives
- Find() operation becomes quites fast

Example 2: Representatives

Making Union-Find Faster

Idea 2: When performing union of two sets, update the smaller set (less work)
Measure complexity of all unions throughout the lifecycle (together)
- We call Union() exactly n - 1 times
- If we connect to a disjoint element every time, it will take n time total (best case)
- Merging large sets, say n / 2 and n / 2 elements, will take $O(n)$ time for one Union() - too slow

Smarter Union-Find

Idea 3: No need to store the actual representative for each element, as long as you can find it quickly
- Each element knows someone who knows the representative (may need more steps)
- Union() becomes very fast: one of representatives will need to know the other
- Find() becomes slower

Example 3: Hierarchical Representatives

Path Compression

So far, Find() was read-only
- For element j, finds the representative k
- Traverses other elements on the way (for which k is also the representative)
Idea 4: We can tell j that its representative is now k
- Same for other elements on path from j to k
- Doubles workload of Find(), but same Big-O

Complexity with Path Compression

Must use amortized analysis over the life cycle of union-find (starting with n disjoint sets, and merging until there is one set containing all elements)
Result is surprising
- $O(n \times \alpha(n))$, where $\alpha()$ grows very slowly
- $\alpha()$ is the inverse-Ackerman function
- In practice, almost-linear-time performance
Details taught in more advanced courses

Example 4: Path Compression