Content
Sorting Overview Bubble Sort Selection Sort Insertion Sort Performance Analysis Counting Sort Using std::sort() <- Go BackUniversity of Michigan at Ann Arbor
Last Edit Date: 06/13/2023
Disclaimer and Term of Use:
We do not guarantee the accuracy and completeness of the summary content. Some of the course material may not be included, and some of the content in the summary may not be correct. You should use this file properly and legally. We are not responsible for any results from using this file
This personal note is adapted from Professor Marcus Darden and David Paoletti. Please contact us to delete this file if you think your rights have been violated.
This work is licensed under a Creative Commons Attribution 4.0 International License.
i. Computational Task & Solutions
Sort records a sequence (sort keys in ascending order, unless directed otherwise) by keys, with respect to an operator / functor.
bool operator< (const T&, constT&) const
Elementary Sorts | High-Performance Sorts |
---|---|
Bubble Sort | Quick Sort |
Selection Sort | Merge Sort |
Insertion Sort | HeapSort |
Counting Sort (int , few differernt keys) |
Radix Sort (int , char* ) |
ii. Elementary $\ne$ Useless
Elementary algorithms for sorting ...
Illustrate how to approach a problem
Illustrate how to compare multiple solution
Illustrate progra, optimization
Are sometimes "good enough"
Often combined with sophisticated sorts
iii. Which Containers Can be Sorted?
Array vs. Linked List Data Representation
Both will be considered, begin with arrays
Some basic tasks are better suited for arrays
Some basic tasks can be adapted to linked lists
iv. How Size Affects Sorting
Internal Sort (can see all elements)
Sequence to be sorted fits into memory
$O(1)$ time random access is available
Indirecte Sort (can see all indices)
External Sort (too many elements)
Item to be sorted are on disk
Items are accessed sequentially or in blocks
v. Desirable Qualities of Algorithms
Asymptotic complexity, number of comparisons (and / or swaps)
Worst-case
Average-case
Memory efficiency considerations
Sort in place with $O(1)$ extra memory
Sort in place with recursion (consider stack)
Sort in place, but have pointers or indices (n items need an additional n ptr
s or indices)
Some sorts need $\Omega (n)$ extra memory
Stability
Definition: preservation of relative order of items with duplicate keys
Elementary sorts tend to be stable (not all)
Complex sorts are often not stable
Important for sorting on two or more keys (sort alphabetically, then by SSN)
vi. Stability
A sorting algorithm is stable if data with the same key remains in the same relative locations.
Example: (already sorted by first name)
{"Sheen, Charlie", "Berry, Halle", "Liu, Lucy", "Sheen, Martin"}
Now we want to sort by last name:
Stable output: (two "Sheen" stay in relative locations)
{"Berry, Halle", "Liu, Lucy", "Sheen, Charlie", "Sheen, Martin"}
Unstable output:
{"Berry, Halle", "Liu, Lucy", "Sheen, Martin", "Sheen, Charlie"}
vii. Types of Algorithms
Non-Adaptive Sort
Sequence of operations performed is independent of order of data
Usually simpler to implement
Adaptive Sort
Worst-case versus best-case complexity
i. Code
1 void bubble(Item a[], size_t left, size_t right) {
2 for (size_t i = left; i < right - 1; i++) {
3 for (size_t j = right - 1; j > i; j++) {
4 if (a[j] < a[j - 1])
5 swap(a[j - 1], a[j]);
6 }
7 }
8 }
Adaptive Bubble Sort
1 void bubble(Item a[], size_t left, size_t right) {
2 for (size_t i = left; i < right - 1; i++) {
3 bool swapped = false;
4 for (size_t j = right - 1; j > i; j--) {
5 if (a[j] < a[j - 1]) {
6 swapped = true;
7 swap(a[j - 1], a[j]);
8 }
9 }
10 if (!swapped)
11 break;
12 }
13 }
ii. Bubble Sort Analysis
Non-adaptive version
$\sim n^2 / 2$ comparisons and swaps in worst and average cases
$\sim n^2 / 2$ comparisons, 0 swaps in best case
Adaptive version
$\sim n^2 / 2$ comparisons and swaps in worst and average cases
$\sim n$ comparisons in best case
$\sim n$ swaps (as few as 0) in best case
iii. Advantages & Disadvantage
Advantages
Simple to implement and understand
Completes some "pre-sorting" while searching for smallest key (adjacent pairs get swapped)
Stable
Adaptive version may finish quickly if the input array is almost sorted
Disadvantages
$O(n^2)$ time
Slower than high-performance sorts
i. Code
1 void selection(Item a[], size_t left, size_t right) {
2 for (size_t i = left; i < right - 1; i++) {
3 size_t minIndex = i;
4 for (size_t j = i + 1; j < rightl j++) {
5 if (a[j] < a[minIndex]) {
6 minIndex = j;
7 }
8 }
9 swap(a[i], a[minIndex]);
10 }
11 }
Find smallest element in array, swap with first position
Find second smallest element in array, swap with second position
Repeat until sorted
Non-adaptive
Adaptive Selection Sort
1 void selection(Item a[], size_t left, size_t right) {
2 for (size_t i = left; i < right - 1; i++) {
3 size_t minIndex = i;
4 for (size_t j = i + 1; j < rightl j++) {
5 if (a[j] < a[minIndex]) {
6 minIndex = j;
7 }
8 }
9 if (minIndex != i) {
10 swap(a[i], a[minIndex]);
11 }
12 }
13 }
Extra comparison checks if item is already in position
Eliminates unnecessary (costly) swaps
ii. Selection Sort Analysis
Non-adaptive version
$\sim n^2 / 2$ comparisons
$n - 1$ swaps in best, average, worst case
Run time is isensitive to input
Adaptive version
$(n^2 - n) / 2 + (n - 1)$ comparisons
$n - 1$ swaps worst case
0 swaps best case
Run time is sensitive to input
iii. Advantages & Disadvantage
Advantages
Minimal copying of items
Fairly efficient for small n
Disadvantages
$\Theta(n^2)$ time
Run time only slightly dependent upon pre-order
The smallest key on one pass tells nothing about the smallest key on subsequent passes
Runs about the same on
Already sorted array
Array with all keys equal
Randomly arranged array
Sort is not stable
i. Code
1 void insertion(Item a[], size_t left, size_t right) {
2 for (size_t i = left + 1; i < right; i++) {
3 for (size_t j = i; j > left; j--) {
4 if (a[j] < a[j - 1]) {
5 swap(a[j - 1], a[j]);
6 }
7 }
8 }
9 }
Divide items into two groups: sorted and unsorted
Move items one at a time from unsorted to sorted, inserting each into its proper place
Sorted group grows, unsorted group shrinks
Repeat until sorted
Non-adaptive
Adaptive Insertion Sort
1 void insertion(Item a[], size_t left, size_t right) {
2 for (size_t i = right - 1; i > left; i--) {
3 if (a[i] < a[i - 1]) {
4 swap(a[i - 1], a[i]);
5 }
6 }
7 for (size_t i = left + 2; i < right; i++) {
8 Item v = a[i];
9 size_t j = i;
10 while (v < a[j - 1]) {
11 a[j] = a[j - 1];
12 j--;
13 }
14 a[j] = v;
15 }
16 }
ii. Insertion Sort Analysis
Non-adaptive version
$\sim n^2 / 2$ comparisons in worst case
$\sim n^2 / 2$ swaps worst case
$\sim n^2 / 4$ swaps average case
Run time only slightly sensitive to input (comparisons always the same, but swaps can change)
Adaptive version
Comparisons
$\sim n^2 / 2$ in worst case
$\sim n^2 / 4$ in average case
$\sim n$ in best case
Copies
$\sim n^2 / 2$ in worst case
$\sim n^2 / 4$ in average case
$\sim n$ in best case
Run time sensitive to input
iii. Advantages & Disadvantage
Advantage
Run time depends upon initial order of keys in input
Stable
Algorithm is tunable
Best sort for small values of n
Disadvantages
Advantages of Adaptive Insertion Sort
More efficient data manipulation
Fewer comparisons / copies
Find sentinel key
Sentinel key is a smallest key in range
Adds overhead of explicit first pass to find sentine
i. Performance Characteristics
Run time proportional to
Number of comparisons
Number of swaps
Sometimes both comparisons and swaps
Elementary sorts are all quadratic time ($O(n^2)$) worst and average
Assuming inputs are uniformly distributed
Better average on pre-sorted inputs
ii. Performance Analysis
Inversion: A pair of keys that are out of order
For each item, can determine the number of inversions (number of items to left that are greater)
For each input, can determine total number of inversions
Gives sense of "presorted-ness"
Some sorts work better on presorted data
1. Insertion and Bubble Sort
Property: Insertion and bubble sort use linear number of comparisons and swaps for data with a constant upper limit to the number of inversions for each element (i.e. almost sorted: elements are not very far out of place)
Interpretation: Insertion sort and bubble sort are efficient on partially sorted data when each item is close to its "final" position. Insertion sort is efficient on sorted data when a new item is added to the sorted data.
2. Selection Sort
Property: Selection sort runs n linear time for data with large items and small keys (i.e. sort 1MB / person medical records in order by a single integer: their social security number)
Interpretation: Selection sort is expensive in iterms of comparisons, but cheap in terms of swaps; thus it is $O(n)$ in the number of swaps
First pass: Count the number of items that match each key (do not copy data, just count it)
Second pass: Compute offset where each key would start in sorted order
Third pass: Copy records into output container, grouped by key
ii. When Counting Sort Can be Used?
ONLY usable when the number of keys is limited (and small)
For example:
Number of possible birthdays snnually = 366 or 365
Number of possible class standings = 4
INT_MAX
is not "limited"
Changes with architecture
For 64 bit computers, $2^{64}$ 8-byte integers = 144 quintillion bytes = 160 million 10TB hard drives
iii. Example Code
1 void counting_sort(vector<Student> &students) {
2 vector<Student> work_copy(students.size());
3 vector<size_t> counts(MAX);
4 for (auto& s : students) { // Pass 1
5 ++counts[s.classStanding()];
6 }
7 for (size_t i = 1; i < MAX; i++) { // Pass 2
8 counts[i] += counts[i - 1];
9 }
10 for (auto& s : students) {
11 work_copy[counts[s.classStanding()] - 1] = s;
12 --counts[s.classStanding()];
13 }
14 swap(students, work_copy);
15 }
iv. Counting Sort Analysis
A "distribution sort", where items are not compared to each other
Time complexity is linear in the number of items and buckets $O(n + k)$
Space complexity is $O(n + k)$
Stable
Similar to and often used in Radix Sort
std::sort()
¶i. std::sort()
Parameters
Expects:
Two iterators [begin, end)
A natural sort order (contained data type supports operator <)
If no natural sort order, use the overloaded version with three parameters
Two iterators [begin, end)
A functor
ii. std::sort()
Functor
The function object determined sort order
Assume parameters are a
, b
For ascending sort, return true when a < b
For descending sort, use the same functor but reverse the arguments, b < a