University of Michigan at Ann Arbor
Last Edit Date: 05/30/2023
Disclaimer and Term of Use:
We do not guarantee the accuracy and completeness of the summary content. Some of the course material may not be included, and some of the content in the summary may not be correct. You should use this file properly and legally. We are not responsible for any results from using this file
This personal note is adapted from Professor Marcus Darden and David Paoletti. Please contact us to delete this file if you think your rights have been violated.
This work is licensed under a Creative Commons Attribution 4.0 International License.
Needed to handle insert, search, and removal, when multiple keys has to the same table index.
Methods of Collision Resolution
Separate Chaining
Linear Probing (open addressing method)
Quandratic Probing (open addressing method)
Double Hashing (open addressing method)
Resolve collisions by maintaining M linked lists, one for each table index
Separate chaining reduces the number of comparisons for sequential search by a factor of M (on average), using extra space for M links.
In a table with M lists (table indices) and N keys, the probability that the number of keys in each list is within a small constant factor of N/M is extremely close to 1, if the hash function is good
i. Load Factor ($\alpha$)
$\alpha = N / M$, where $N$ keys are placed in an M-sized table
Separate Chaining ($\alpha \le 0.5$ by default)
$\alpha$ is average number of items per list
$\alpha$ can be $> 1$
Open Addressing
$\alpha$ is percentage of table indices occupied
$\alpha$ must be $\le 1$
ii. Separate Chaining Complexity
Insert: constant time
Search: time proportional to $\alpha$
Remove: dependent upon search
This is why we choose $M \approx N$: $O(\alpha) = O(1)$
iii. Speeding up the Worse Case
Use something other than linked lists
Use M vectors
Use M binary search trees, which is $O(\log \alpha)$ insert and search (as long as tree is not a stick)
Resolve collisions by using empty locations in the table
i. Probing
Check a location, by index, in the hash table
Used to search for data or find available locations
Possible probe outcomes (basic)
Empty: no data found at probed location
Hit: probe find occupied location that contains an item whose key matches the search key
Full: probe finds occupied location but item key does not match key
If probe result is full, then probe table at "next higher" cell unit hit (search ends successfully) or empty (search ends unsuccessfully)
ii. Linear Probing
Hash search key to get initial table index
Probe buckets at sequential indices, starting at table index, until search key or an open address is found
Collision resolution
$t(\text{key}) \mod M \Rightarrow$ table index
If bucket at table index is full, then try
iii. Clusters
A contiguous group of occupied table cells
Consider a table taht is half-full (N / M / 2)
What is the best-case distribution of half-full table?
Best case situation: $0.5 \Rightarrow \Theta(1)$
What is worst-case distribution of half-full table?
Worst case situation: $0.5 \times 1 + 0.5 \times (N / 2) = 0.5 + N / 4 \Rightarrow \Theta(n)$
iv. Open Addressing Deletions
Option 1: remove it, re-hash rest of cluster
Option 2: use a deleted "item"
Not a date item, not empty either
We will call this "deleted"
"Deleted" Items
New probing outcome
When a probe finds an item marked deleted
Insert considers it an empty spot (miss) and thus usable (if the key does not exist elsewhere)
Search considers it occupied (full) and keeps searching
Maintaining Bucket Status
To determine whether a bucket is used or ununsed required a bool for each bucket
When a third state is added: used, unused, or deleted
Option 1: Use two bools
Option 2: Declare an enum
enum
: A variable type that can only take on certain "enumerated" values
v. Counting Probes
When collisions are resolved with linear probing, the average number of probes required to search in a hash table of size M that contains $N = \alpha M$ keys is about:
$$\frac{1}{2} (1 + \frac{1}{1 - \alpha})~~\text{for hits}$$$$\frac{1}{2} (1 + \frac{1}{(1 - \alpha)^2})~~\text{for misses}$$Effect of Load on Number of Probes Linear Probing
$\alpha$ | Average Probes: Successful Search | Average Probes: Unsuccessful Search |
---|---|---|
0.1 | 1.1 | 1.1 |
0.2 | 1.1 | 1.3 |
0.3 | 1.2 | 1.5 |
0.4 | 1.3 | 1.9 |
0.5 | 1.5 | 2.5 |
0.6 | 1.8 | 3.6 |
0.7 | 2.2 | 6.1 |
0.8 | 3.0 | 13.0 |
0.9 | 5.5 | 50.5 |
vi. Quadratic Probing
Hash search key to get initial table index
Probe buckets at increasing "distance" from the table index, until search key or an open address is found
Collision resolution
$t(\text{key}) \mod \Rightarrow$ tabel index
If bucket at table index is full, then try
Counting Probes
When collisions are resolved with linear probing, the average number of probes required to search in a hash table of size M that contains $N = \alpha M$ keys is about:
$$1 - \frac{\alpha}{2} + \ln (\frac{1}{1 - \alpha})~~\text{for hits}$$$$\frac{1}{1 - \alpha} - \alpha + \ln (\frac{1}{1 - \alpha})~~\text{for misses}$$Effect of Load on Number of Probes Linear Probing
$\alpha$ | Average Probes: Successful Search | Average Probes: Unsuccessful Search |
---|---|---|
0.1 | 1.06 | 1.12 |
0.2 | 1.12 | 1.27 |
0.3 | 1.21 | 1.49 |
0.4 | 1.44 | 2.19 |
0.5 | 1.62 | 2.82 |
0.6 | 1.62 | 2.82 |
0.7 | 1.85 | 3.84 |
0.8 | 2.21 | 5.81 |
0.9 | 2.85 | 11.40 |
vii. Double Hashing
Hash search key to get initial table index
Use a second has function to calculate an interval used to probe buckets after the table index, until search key or an open address is found
Collision resolution
$t(\text{key}) \mod M \Rightarrow$ table index
If bucket at table index is full, then try
$(t(\text{key}) + j \times t'(key)) \mod M$ for $j = 1, 2, 3, ...$
$t'(\text{key}) = q - (t(\text{key}) \mod q)$ for some prime number $q < M$
i. Introduction
As the number of keys in a hash table increases, search performance degrades
Separate chaining
Search time increases gradually
Doubling number of keys means building (on average) list length at each of M table indices
Linear Probing
Search time increases dramatically as table fills
May reach point when no more keys can be inserted
ii. Simple Dynamic Hashing
Double the size of table when it "fill up"
Choose a load factor of 0.5
Each item must be rehashed into the table
Expensive, but infrequent
Cannot guarantee that each and every operation will be fast, but can guarantee that average cost per operation will be low
Total cost is low, but performance profile is erratic
Most operations are extremely fast, but some operations require much more time to rebuild the table
Concept:
Each insert
Pays (small constant) cost to actually insert
Deposits other small constant ("balance") in a bank
First M / 2 - 1: build up "balance"
(M / 2)-th insertion
Application:
Assume linear probing
Start with table of size M
Each insertion in a table $\le \frac{1}{2}$ full
Insert $M / 2 - 1$ keys
Insert $(M / 2)^{th}$ key
Build new table, size 2M
Remove keys from old table, insert new
Each insert $\le \frac{1}{4}$ full, with average cost less than 1.5 probes
$1.5 * M / 2 = O(M)$
$O(M) + O(M) = O(M)$