EECS 281: Data Structures and Algorithms

Section 16 Minimum Spanning Trees

University of Michigan at Ann Arbor

Last Edit Date: 06/07/2023

Disclaimer and Term of Use:

We do not guarantee the accuracy and completeness of the summary content. Some of the course material may not be included, and some of the content in the summary may not be correct. You should use this file properly and legally. We are not responsible for any results from using this file

This personal note is adapted from Professor Marcus Darden and David Paoletti. Please contact us to delete this file if you think your rights have been violated.

This work is licensed under a Creative Commons Attribution 4.0 International License.

Minimum Spanning Trees Introduction¶

i. The Minimum Spanning Tree Problem

Given: edge-weighted, undirected graph $G = (V, E)$

Find: subgraph $T = (V, E')$, $E' \subseteq E$ such that

All vertices are pair-wise connected
The sum of all edge weights in T is minimal

See a cycle in T?

Remove edge with highest weight

Therefore, T must be a tree (no cycles)

Example: Data Centers Connection

ii. Create a MST

Minimum Spanning Tree (MST) if:
- All vertices are pair-wise connected
- The sum of all edge weights in T is minimal

Step 1	Step 2	Step 3

Pirm's Algorithm¶

Find MST on edge-weighted, connected, undirected graphs
Greedily select edges one by one and add to a growing sub-graph
Grows a tree from a single vertex

Steps to do Pirm's Algorithm:

Initializa a tree with a single vertex, chosen arbitarily from the graph
Grow the tree by one edge: of the edges that connect the tree to vertices not yet in the tree, find the minimum-weight edge, and add that vertex to the tree
Repeat step 2 (until all vertices are in the tree)

Given graph G = (V, E)
Start with 2 sets of vertices: 'innies' & 'outies'
'Innies' are visited nodes (initially empty)
'Outies' are not yet visited (initially V)
Select first innie arbitrarily (root of MST)
Repeat until no more outies
Choose outie (v') with smallest distance from any innie
Move v' from outies to innies
Implementation issue: use linear search or PQ?

Data Structures:

Three vectors
- Better: a vector of classes or structures!
For each vertex v, record:
- $k_v$: Has v been visited? (initially false for all $v \in V$)
- $d_v$: What is the minimal edge weight to v? (initially $\infty$ for all $v \in V$, except $v_r = 0$)
- $p_v$: What vertex precedes (is parent of) v? (initially unknown for all $v \in V$)

Pirm's (Linear Search) Algorithm

Set starting point dv to 0.
Loop v times (until every k_v is true):                // |V| times
1. From the set of vertices for which k_v is 
   false, select the vertex v having the 
   smallest tentative distance d_v.                    // O(|V|)
2. Set k_v to true.                                    // O(1)
3. For each vertex w adjacent to v for 
   which k_w is false, test whether dw is 
   greater than distance(v,w). If it is, 
   set d_w to distance(v,w) and set p_w to v.          // Most at this vertex: O(|V|). Cost of each: O(1)

Implementing Prims's Algorithm

Implement in the order listed:
- 1: Loop over all vertices: find smallest false $k_v$
- 2: Mark $k_v$ as true
- 3: Loop over all vertices: update false neighbors of $k_v$
Common Mistake: Set the first vertex to true outside the loop
Reordering this can result in a simple algorithm that simply doesn’t work

Step 1	Step 2	Step 3

Step 4	Step 5	Step 6

Step 7

Prim's (Heap) Algorithm

Set starting point dv to 0.
Loop v times (until every k_v is true):                // |E| times
1. From the set of vertices for which k_v is 
   false, select the vertex v having the 
   smallest tentative distance d_v.                    // O(log |E|)
2. Set k_v to true.                                    // O(1)
3. For each vertex w adjacent to v for 
   which k_w is false, test whether dw is 
   greater than distance(v,w). If it is, 
   set d_w to distance(v,w) and set p_w to v.          // Most at this vertex: O(|V|). Cost of each: O(log (|V|))
                                                       // Visits every edge once (over all iterations) = O(|E|)

Algorithm Prims_Heaps(G, s0)
    //Initialize
    n = |V|                                      // O(1)
    create_table(n) //stores k,d,p               // O(v)
    create_pq() //empty heap                     // O(1)
    table[s0].d = 0                              // O(1)
    insert_pq(0, s0)                             // O(1)
    while (!pq.isempty)                          // O(E)
        v0 = getMin() //heap top() & pop()       // O(log E)
        if (!table[v0].k) //not known            // O(1)
            table[v0].k = true                   // O(1)
            for each vi Î Adj[v0]                // O(1 + E / V)
                distance = weight(v0, vi)        // O(1)
                if (distance < table[vi].d)      // O(1)
                    table[vi].d = distance       // O(1)
                    table[vi].p = v0             // O(1) 
                    insert_pq(distance, vi)      // O(log E)

Complexity Summary

$O(V^2)$ for the simplest nested-loop implementation
$O(E \log E)$ with heaps

Kruskal’s Algorithm¶

Find an MST on edge-weighted, connected, undirected graphs
Greedily select edges one by one and add to a growing sub-graph
Grows a forest of trees that eventually merges into a single tree

Steps to do Kruskal's Algorithm

Presort all edges: $O(E \log E) \approx O(E \log V)$ time
Try inserting in order of increasing weight
Some edges will be discarded so as not to create cycles

Initial edges may be disjoint
- Kruskal's grows a forest (union of disjoint trees)

Step 1	Step 2	Step 3

Step 4	Step 5	Step 6

Complexity Analysis

Sorting takes $E \log E$
- Happens to be the bottleneck of entire algorithm
Remaining work: a loop over E edges
- Discarding an edge is trivial $O(1)$
- Adding an edge is easy $O(1)$
- Most time spent testing for cycles
- Good news: takes less than $\log E \approx \log V$
Key idea: if vertices $v_i$ and $v_j$ are connected, then a new edge would create a cycle
- Only need to maintain disjoint sets

Maintaining Disjoint Sets

N locations with no connecting roads
Roads are added one by one
- Distances are unimportant (for now)
- Connectivity is important
Want to connect cities ASAP
Redundant roads would slow us down

Question: For two cities k and j, would road (k, j) be redundant?

Answer: Use a Union-Find data structure.

MST Summary¶

MST is lowest-cost sub-graph that
- Includes all nodes in a graph
- Keeps all nodes connected
Two algorithms to find MST
- Prim: iteratively adds closest node to current tree - very similar to Dijkstra, $O(V^2)$ or $O(E \log V)$
- Kruskal: iteratively builds forest by adding minimal edges, $O(E \log V)$
For dense G, use the nested-loop Prim variant
For sparse G, Kruskal is faster
- Relies on the efficiency of sorting algorithms
- Relies on the efficiency of union-find