Disjoint Set Union (Union-Find)
SummaryCovers the Union-Find data structure with two optimizations...
Covers the Union-Find data structure with two optimizations...
Covers the Union-Find data structure with two optimizations (path compression and union by rank) achieving O(α(n)) amortized complexity, with applications in graph connectivity and Kruskal's MST.
Disjoint Set Union (Union-Find)
Picture a social network with a million users. Friendships form one at a time — “Alice befriends Bob,” “Carol befriends Dave.” After each friendship, you need to answer: “Are Alice and Dave in the same friend group?” and “How many distinct friend groups exist?”
This is the dynamic connectivity problem, and Union-Find solves it with near-constant time per operation. It’s one of the most elegant data structures in computer science — simple to implement, brutally efficient, and a favorite in interviews.
The Problem
You have n elements, initially each in its own singleton set. Support three operations:
- union(a, b) — merge the sets containing
aandb - find(a) — return the representative (root) of the set containing
a - connected(a, b) — do
aandbbelong to the same set?
With arrays or linked lists, at least one of these operations costs O(n). Union-Find achieves amortized O(α(n)) for all of them — effectively constant time.
The Forest Representation
Union-Find models sets as a forest of trees. Each element has a parent pointer. The root of each tree is the set’s representative — and a root points to itself.
Initial state (5 elements, each its own set):
0 1 2 3 4
↑ ↑ ↑ ↑ ↑
(self-loops: parent[i] = i)
After union(0, 1) and union(2, 3):
0 2
↑ ↑
1 3 4
After union(0, 2):
0
/ \
1 2
↑
3 4
Finding which set an element belongs to means following parent pointers to the root. Two elements are in the same set if and only if they share the same root.
Basic Implementation
public class UnionFind {
private final int[] parent;
public UnionFind(int n) {
parent = new int[n];
for (int i = 0; i < n; i++) {
parent[i] = i; // each element is its own root
}
}
public int find(int x) {
while (parent[x] != x) {
x = parent[x];
}
return x;
}
public void union(int x, int y) {
int rootX = find(x);
int rootY = find(y);
if (rootX != rootY) {
parent[rootX] = rootY; // attach one root under the other
}
}
public boolean connected(int x, int y) {
return find(x) == find(y);
}
}
This works — but it has a fatal flaw. Consider union operations that always chain elements linearly: union(0,1), union(1,2), union(2,3), … The tree degrades into a linked list of height n, making find cost O(n). Two optimizations fix this.
Optimization 1: Path Compression
During find, every node you visit on the path to the root ends up pointing directly to the root. Future find calls on any of these nodes return in O(1).
Recursive version (elegant one-liner):
public int find(int x) {
if (parent[x] != x) {
parent[x] = find(parent[x]); // path compression
}
return parent[x];
}
Iterative version (no stack overflow risk):
public int find(int x) {
int root = x;
while (parent[root] != root) {
root = parent[root];
}
// Compress path
while (parent[x] != root) {
int next = parent[x];
parent[x] = root;
x = next;
}
return root;
}
Before and after path compression on find(4):
Before: After:
0 0
↑ / | \
1 1 2 4
↑ ↑
2 3
↑
3
↑
4
Every node on the path from 4 to 0 now points directly to root 0. The tree flattens dramatically.
Path compression alone reduces the amortized cost to O(log n) per operation. But combining it with the second optimization pushes performance even further.
Optimization 2: Union by Rank
The problem with naive union is that you attach trees without considering their heights. Union by rank always attaches the shorter tree under the taller tree’s root, keeping the overall structure shallow.
rank[i] is an upper bound on the height of the subtree rooted at i. When two roots have equal rank, pick either as the new root and increment its rank by 1.
public void union(int x, int y) {
int rootX = find(x);
int rootY = find(y);
if (rootX == rootY) return;
if (rank[rootX] < rank[rootY]) {
parent[rootX] = rootY;
} else if (rank[rootX] > rank[rootY]) {
parent[rootY] = rootX;
} else {
parent[rootY] = rootX;
rank[rootX]++;
}
}
Why “rank” and not “height”? Path compression changes actual tree heights, but updating heights precisely would cost too much. Rank serves as a frozen upper bound — it only increases during equal-rank unions, never decreases during path compression. This keeps the bookkeeping O(1).
Alternative: Union by Size — attach the smaller tree (fewer elements) under the larger tree’s root. This provides the same asymptotic guarantee and has the advantage that size[root] always reflects the actual component size.
Both Together: O(α(n)) Amortized
With path compression and union by rank (or size), every operation runs in amortized O(α(n)) time, where α is the inverse Ackermann function.
The Ackermann function grows astronomically fast — faster than any primitive recursive function. Its inverse, α(n), grows correspondingly slowly:
| n | α(n) |
|---|---|
| 1 | 0 |
| 2 | 1 |
| 4 | 2 |
| 16 | 3 |
| 65,536 | 4 |
| 2^65,536 | 5 |
For any value of n that fits in the observable universe, α(n) ≤ 4. This means Union-Find operations are effectively constant time — not “fast O(log n),” but genuinely, practically O(1).
The proof of this bound (due to Tarjan) is intricate and beyond interview scope. What matters: state the bound confidently, explain that α(n) never exceeds 5 for any practical input, and note that it requires both optimizations together.
Complete Implementation
public class UnionFind {
private final int[] parent;
private final int[] rank;
private int components;
public UnionFind(int n) {
parent = new int[n];
rank = new int[n];
components = n;
for (int i = 0; i < n; i++) {
parent[i] = i;
// rank[i] = 0 — already default
}
}
/** Find root with path compression. */
public int find(int x) {
if (parent[x] != x) {
parent[x] = find(parent[x]);
}
return parent[x];
}
/** Union by rank. Returns true if a merge occurred. */
public boolean union(int x, int y) {
int rootX = find(x);
int rootY = find(y);
if (rootX == rootY) return false;
if (rank[rootX] < rank[rootY]) {
parent[rootX] = rootY;
} else if (rank[rootX] > rank[rootY]) {
parent[rootY] = rootX;
} else {
parent[rootY] = rootX;
rank[rootX]++;
}
components--;
return true;
}
/** Check if two elements are in the same set. */
public boolean connected(int x, int y) {
return find(x) == find(y);
}
/** Return the number of disjoint sets. */
public int getComponents() {
return components;
}
}
Key design decisions:
unionreturnsboolean— useful for detecting redundant edges (cycle detection).componentstracks the current count of disjoint sets, decrementing on each successful merge.- Recursive
findwith path compression — clean and correct for interview sizes (stack depth ≤ O(log n) even before compression kicks in).
Applications
Number of Connected Components
Problem: Given n nodes and a list of undirected edges, find the number of connected components.
int countComponents(int n, int[][] edges) {
var uf = new UnionFind(n);
for (int[] edge : edges) {
uf.union(edge[0], edge[1]);
}
return uf.getComponents();
}
Each edge triggers one union call. After processing all edges, getComponents() returns the answer. Time: O(n + m·α(n)) where m is the number of edges.
Kruskal’s Minimum Spanning Tree
Problem: Find the minimum spanning tree of a weighted, undirected graph.
int kruskalMST(int n, int[][] edges) {
// edges[i] = [u, v, weight]
Arrays.sort(edges, (a, b) -> Integer.compare(a[2], b[2]));
var uf = new UnionFind(n);
int totalWeight = 0;
int edgesUsed = 0;
for (int[] edge : edges) {
if (uf.union(edge[0], edge[1])) {
totalWeight += edge[2];
edgesUsed++;
if (edgesUsed == n - 1) break; // MST complete
}
}
return totalWeight;
}
Sort edges by weight, then greedily add each edge if it connects two different components. Union-Find’s union returns false for edges that would create a cycle. Time: O(m log m + m·α(n)).
Accounts Merge
Problem: Given a list of accounts where each account is a name followed by emails, merge accounts that share any email.
List<List<String>> accountsMerge(List<List<String>> accounts) {
var uf = new UnionFind(accounts.size());
Map<String, Integer> emailToId = new HashMap<>();
// Union accounts sharing emails
for (int i = 0; i < accounts.size(); i++) {
for (int j = 1; j < accounts.get(i).size(); j++) {
String email = accounts.get(i).get(j);
if (emailToId.containsKey(email)) {
uf.union(i, emailToId.get(email));
} else {
emailToId.put(email, i);
}
}
}
// Group emails by root account
Map<Integer, TreeSet<String>> merged = new HashMap<>();
for (var entry : emailToId.entrySet()) {
int root = uf.find(entry.getValue());
merged.computeIfAbsent(root, _ -> new TreeSet<>()).add(entry.getKey());
}
// Build result
List<List<String>> result = new ArrayList<>();
for (var entry : merged.entrySet()) {
var list = new ArrayList<String>();
list.add(accounts.get(entry.getKey()).getFirst());
list.addAll(entry.getValue());
result.add(list);
}
return result;
}
The insight: treat each account as a node. If two accounts share an email, union them. Then group all emails under their root account.
Redundant Connection
Problem: A graph with n nodes and n edges (one extra edge beyond a tree). Find the edge that, if removed, leaves a tree.
int[] findRedundantConnection(int[][] edges) {
var uf = new UnionFind(edges.length + 1); // 1-based nodes
for (int[] edge : edges) {
if (!uf.union(edge[0], edge[1])) {
return edge; // this edge creates a cycle
}
}
return new int[0]; // unreachable for valid input
}
The first edge where union returns false is the redundant connection — it tried to connect two already-connected nodes.
Number of Islands II
Problem: Given a grid, land cells are added one at a time. After each addition, report the number of islands.
List<Integer> numIslands2(int m, int n, int[][] positions) {
var uf = new UnionFind(m * n);
boolean[][] grid = new boolean[m][n];
int islands = 0;
List<Integer> result = new ArrayList<>();
int[][] dirs = {{0,1},{0,-1},{1,0},{-1,0}};
for (int[] pos : positions) {
int r = pos[0], c = pos[1];
if (grid[r][c]) { // duplicate position
result.add(islands);
continue;
}
grid[r][c] = true;
islands++;
int id = r * n + c;
for (int[] d : dirs) {
int nr = r + d[0], nc = c + d[1];
if (nr >= 0 && nr < m && nc >= 0 && nc < n && grid[nr][nc]) {
if (uf.union(id, nr * n + nc)) {
islands--;
}
}
}
result.add(islands);
}
return result;
}
Each new land cell starts as its own island. Then check all four neighbors — if a neighbor is land, union them. Each successful union reduces the island count by 1. This runs in O(k·α(m·n)) for k additions.
Weighted Union-Find
In some problems, you need to track a relative value between elements. For example: “A is 2 kg heavier than B, B is 3 kg heavier than C — how much heavier is A than C?”
Weighted Union-Find stores a weight on each parent edge, representing the value difference between a node and its parent.
public class WeightedUnionFind {
private final int[] parent;
private final int[] rank;
private final double[] weight; // weight[i] = value(i) - value(parent[i])
public WeightedUnionFind(int n) {
parent = new int[n];
rank = new int[n];
weight = new double[n];
for (int i = 0; i < n; i++) parent[i] = i;
}
public int find(int x) {
if (parent[x] != x) {
int root = find(parent[x]);
weight[x] += weight[parent[x]]; // accumulate weights along path
parent[x] = root;
}
return parent[x];
}
/** Union x and y with the relationship: value(x) / value(y) = w */
public boolean union(int x, int y, double w) {
int rootX = find(x);
int rootY = find(y);
if (rootX == rootY) return false;
if (rank[rootX] < rank[rootY]) {
parent[rootX] = rootY;
weight[rootX] = w - weight[x] + weight[y];
} else if (rank[rootX] > rank[rootY]) {
parent[rootY] = rootX;
weight[rootY] = -w + weight[x] - weight[y];
} else {
parent[rootY] = rootX;
weight[rootY] = -w + weight[x] - weight[y];
rank[rootX]++;
}
return true;
}
/** Returns value(x) / value(y), or -1.0 if not connected. */
public double query(int x, int y) {
if (find(x) != find(y)) return -1.0;
return weight[x] - weight[y];
}
}
This variant powers the “Evaluate Division” problem (given equations like a/b = 2.0, b/c = 3.0, answer queries like a/c = ?). Path compression accumulates weights along the compressed path, maintaining correct relative values.
Edge Cases and Interviewer Tips
Edge cases to handle:
- Unioning an element with itself —
find(x) == find(y), so no merge occurs. Yourunionshould handle this gracefully (returnfalse). - Single element — works correctly; it’s its own root with rank 0.
- Already-connected elements —
unionshould detect this and skip the merge, which is critical for cycle detection. - Large input — Union-Find handles millions of elements because the amortized cost per operation is O(α(n)) ≈ O(1).
What interviewers want to hear:
- Both optimizations are needed for O(α(n)). Path compression alone gives O(log n). Union by rank alone gives O(log n). Together they achieve the tighter bound.
- Clearly state that α(n) ≤ 5 for any practical input size. Don’t hand-wave — mention the inverse Ackermann function by name.
- Explain why you chose union by rank vs. union by size. Both work. Union by size has the advantage of tracking actual component sizes, which some problems require.
- When asked “what’s the space complexity?” — O(n) for the parent and rank arrays.
- Know that Union-Find does not support efficient split (disconnecting a set into two). If the problem requires splits, consider a different approach like link-cut trees.
Time complexity summary:
- find: O(α(n)) amortized
- union: O(α(n)) amortized
- connected: O(α(n)) amortized
- Space: O(n)
Union-Find is deceptively powerful. Two short arrays and two simple optimizations give you a structure that handles dynamic connectivity faster than almost anything else you could build. Master it, and an entire category of graph problems becomes straightforward.