Disjoint Set Union (Union-Find)

Picture a social network with a million users. Friendships form one at a time — “Alice befriends Bob,” “Carol befriends Dave.” After each friendship, you need to answer: “Are Alice and Dave in the same friend group?” and “How many distinct friend groups exist?”

This is the dynamic connectivity problem, and Union-Find solves it with near-constant time per operation. It’s one of the most elegant data structures in computer science — simple to implement, brutally efficient, and a favorite in interviews.

The Problem

You have n elements, initially each in its own singleton set. Support three operations:

union(a, b) — merge the sets containing a and b
find(a) — return the representative (root) of the set containing a
connected(a, b) — do a and b belong to the same set?

With arrays or linked lists, at least one of these operations costs O(n). Union-Find achieves amortized O(α(n)) for all of them — effectively constant time.

The Forest Representation

Union-Find models sets as a forest of trees. Each element has a parent pointer. The root of each tree is the set’s representative — and a root points to itself.

Initial state (5 elements, each its own set):
  0    1    2    3    4
  ↑    ↑    ↑    ↑    ↑
 (self-loops: parent[i] = i)

After union(0, 1) and union(2, 3):
  0    2
  ↑    ↑
  1    3    4

After union(0, 2):
    0
   / \
  1   2
      ↑
      3         4

Finding which set an element belongs to means following parent pointers to the root. Two elements are in the same set if and only if they share the same root.

Basic Implementation

public class UnionFind {
    private final int[] parent;

    public UnionFind(int n) {
        parent = new int[n];
        for (int i = 0; i < n; i++) {
            parent[i] = i; // each element is its own root
        }
    }

    public int find(int x) {
        while (parent[x] != x) {
            x = parent[x];
        }
        return x;
    }

    public void union(int x, int y) {
        int rootX = find(x);
        int rootY = find(y);
        if (rootX != rootY) {
            parent[rootX] = rootY; // attach one root under the other
        }
    }

    public boolean connected(int x, int y) {
        return find(x) == find(y);
    }
}

This works — but it has a fatal flaw. Consider union operations that always chain elements linearly: union(0,1), union(1,2), union(2,3), … The tree degrades into a linked list of height n, making find cost O(n). Two optimizations fix this.

Optimization 1: Path Compression

During find, every node you visit on the path to the root ends up pointing directly to the root. Future find calls on any of these nodes return in O(1).

Recursive version (elegant one-liner):

public int find(int x) {
    if (parent[x] != x) {
        parent[x] = find(parent[x]); // path compression
    }
    return parent[x];
}

Iterative version (no stack overflow risk):

public int find(int x) {
    int root = x;
    while (parent[root] != root) {
        root = parent[root];
    }
    // Compress path
    while (parent[x] != root) {
        int next = parent[x];
        parent[x] = root;
        x = next;
    }
    return root;
}

Before and after path compression on find(4):

Before:           After:
    0                0
    ↑              / | \
    1             1   2  4
    ↑                 ↑
    2                 3
    ↑
    3
    ↑
    4

Every node on the path from 4 to 0 now points directly to root 0. The tree flattens dramatically.

Path compression alone reduces the amortized cost to O(log n) per operation. But combining it with the second optimization pushes performance even further.

Optimization 2: Union by Rank

The problem with naive union is that you attach trees without considering their heights. Union by rank always attaches the shorter tree under the taller tree’s root, keeping the overall structure shallow.

rank[i] is an upper bound on the height of the subtree rooted at i. When two roots have equal rank, pick either as the new root and increment its rank by 1.

public void union(int x, int y) {
    int rootX = find(x);
    int rootY = find(y);
    if (rootX == rootY) return;

    if (rank[rootX] < rank[rootY]) {
        parent[rootX] = rootY;
    } else if (rank[rootX] > rank[rootY]) {
        parent[rootY] = rootX;
    } else {
        parent[rootY] = rootX;
        rank[rootX]++;
    }
}

Why “rank” and not “height”? Path compression changes actual tree heights, but updating heights precisely would cost too much. Rank serves as a frozen upper bound — it only increases during equal-rank unions, never decreases during path compression. This keeps the bookkeeping O(1).

Alternative: Union by Size — attach the smaller tree (fewer elements) under the larger tree’s root. This provides the same asymptotic guarantee and has the advantage that size[root] always reflects the actual component size.

Both Together: O(α(n)) Amortized

With path compression and union by rank (or size), every operation runs in amortized O(α(n)) time, where α is the inverse Ackermann function.

The Ackermann function grows astronomically fast — faster than any primitive recursive function. Its inverse, α(n), grows correspondingly slowly:

n	α(n)
1	0
2	1
4	2
16	3
65,536	4
2^65,536	5

For any value of n that fits in the observable universe, α(n) ≤ 4. This means Union-Find operations are effectively constant time — not “fast O(log n),” but genuinely, practically O(1).

The proof of this bound (due to Tarjan) is intricate and beyond interview scope. What matters: state the bound confidently, explain that α(n) never exceeds 5 for any practical input, and note that it requires both optimizations together.

Complete Implementation

public class UnionFind {
    private final int[] parent;
    private final int[] rank;
    private int components;

    public UnionFind(int n) {
        parent = new int[n];
        rank = new int[n];
        components = n;
        for (int i = 0; i < n; i++) {
            parent[i] = i;
            // rank[i] = 0 — already default
        }
    }

    /** Find root with path compression. */
    public int find(int x) {
        if (parent[x] != x) {
            parent[x] = find(parent[x]);
        }
        return parent[x];
    }

    /** Union by rank. Returns true if a merge occurred. */
    public boolean union(int x, int y) {
        int rootX = find(x);
        int rootY = find(y);
        if (rootX == rootY) return false;

        if (rank[rootX] < rank[rootY]) {
            parent[rootX] = rootY;
        } else if (rank[rootX] > rank[rootY]) {
            parent[rootY] = rootX;
        } else {
            parent[rootY] = rootX;
            rank[rootX]++;
        }
        components--;
        return true;
    }

    /** Check if two elements are in the same set. */
    public boolean connected(int x, int y) {
        return find(x) == find(y);
    }

    /** Return the number of disjoint sets. */
    public int getComponents() {
        return components;
    }
}

Key design decisions:

union returns boolean — useful for detecting redundant edges (cycle detection).
components tracks the current count of disjoint sets, decrementing on each successful merge.
Recursive find with path compression — clean and correct for interview sizes (stack depth ≤ O(log n) even before compression kicks in).

Applications

Number of Connected Components

Problem: Given n nodes and a list of undirected edges, find the number of connected components.

int countComponents(int n, int[][] edges) {
    var uf = new UnionFind(n);
    for (int[] edge : edges) {
        uf.union(edge[0], edge[1]);
    }
    return uf.getComponents();
}

Each edge triggers one union call. After processing all edges, getComponents() returns the answer. Time: O(n + m·α(n)) where m is the number of edges.

Kruskal’s Minimum Spanning Tree

Problem: Find the minimum spanning tree of a weighted, undirected graph.

int kruskalMST(int n, int[][] edges) {
    // edges[i] = [u, v, weight]
    Arrays.sort(edges, (a, b) -> Integer.compare(a[2], b[2]));
    var uf = new UnionFind(n);
    int totalWeight = 0;
    int edgesUsed = 0;

    for (int[] edge : edges) {
        if (uf.union(edge[0], edge[1])) {
            totalWeight += edge[2];
            edgesUsed++;
            if (edgesUsed == n - 1) break; // MST complete
        }
    }
    return totalWeight;
}

Sort edges by weight, then greedily add each edge if it connects two different components. Union-Find’s union returns false for edges that would create a cycle. Time: O(m log m + m·α(n)).

Accounts Merge

Problem: Given a list of accounts where each account is a name followed by emails, merge accounts that share any email.

List<List<String>> accountsMerge(List<List<String>> accounts) {
    var uf = new UnionFind(accounts.size());
    Map<String, Integer> emailToId = new HashMap<>();

    // Union accounts sharing emails
    for (int i = 0; i < accounts.size(); i++) {
        for (int j = 1; j < accounts.get(i).size(); j++) {
            String email = accounts.get(i).get(j);
            if (emailToId.containsKey(email)) {
                uf.union(i, emailToId.get(email));
            } else {
                emailToId.put(email, i);
            }
        }
    }

    // Group emails by root account
    Map<Integer, TreeSet<String>> merged = new HashMap<>();
    for (var entry : emailToId.entrySet()) {
        int root = uf.find(entry.getValue());
        merged.computeIfAbsent(root, _ -> new TreeSet<>()).add(entry.getKey());
    }

    // Build result
    List<List<String>> result = new ArrayList<>();
    for (var entry : merged.entrySet()) {
        var list = new ArrayList<String>();
        list.add(accounts.get(entry.getKey()).getFirst());
        list.addAll(entry.getValue());
        result.add(list);
    }
    return result;
}

The insight: treat each account as a node. If two accounts share an email, union them. Then group all emails under their root account.

Redundant Connection

Problem: A graph with n nodes and n edges (one extra edge beyond a tree). Find the edge that, if removed, leaves a tree.

int[] findRedundantConnection(int[][] edges) {
    var uf = new UnionFind(edges.length + 1); // 1-based nodes
    for (int[] edge : edges) {
        if (!uf.union(edge[0], edge[1])) {
            return edge; // this edge creates a cycle
        }
    }
    return new int[0]; // unreachable for valid input
}

The first edge where union returns false is the redundant connection — it tried to connect two already-connected nodes.

Number of Islands II

Problem: Given a grid, land cells are added one at a time. After each addition, report the number of islands.

List<Integer> numIslands2(int m, int n, int[][] positions) {
    var uf = new UnionFind(m * n);
    boolean[][] grid = new boolean[m][n];
    int islands = 0;
    List<Integer> result = new ArrayList<>();
    int[][] dirs = {{0,1},{0,-1},{1,0},{-1,0}};

    for (int[] pos : positions) {
        int r = pos[0], c = pos[1];
        if (grid[r][c]) { // duplicate position
            result.add(islands);
            continue;
        }
        grid[r][c] = true;
        islands++;
        int id = r * n + c;

        for (int[] d : dirs) {
            int nr = r + d[0], nc = c + d[1];
            if (nr >= 0 && nr < m && nc >= 0 && nc < n && grid[nr][nc]) {
                if (uf.union(id, nr * n + nc)) {
                    islands--;
                }
            }
        }
        result.add(islands);
    }
    return result;
}

Each new land cell starts as its own island. Then check all four neighbors — if a neighbor is land, union them. Each successful union reduces the island count by 1. This runs in O(k·α(m·n)) for k additions.

Weighted Union-Find

In some problems, you need to track a relative value between elements. For example: “A is 2 kg heavier than B, B is 3 kg heavier than C — how much heavier is A than C?”

Weighted Union-Find stores a weight on each parent edge, representing the value difference between a node and its parent.

public class WeightedUnionFind {
    private final int[] parent;
    private final int[] rank;
    private final double[] weight; // weight[i] = value(i) - value(parent[i])

    public WeightedUnionFind(int n) {
        parent = new int[n];
        rank = new int[n];
        weight = new double[n];
        for (int i = 0; i < n; i++) parent[i] = i;
    }

    public int find(int x) {
        if (parent[x] != x) {
            int root = find(parent[x]);
            weight[x] += weight[parent[x]]; // accumulate weights along path
            parent[x] = root;
        }
        return parent[x];
    }

    /** Union x and y with the relationship: value(x) / value(y) = w */
    public boolean union(int x, int y, double w) {
        int rootX = find(x);
        int rootY = find(y);
        if (rootX == rootY) return false;

        if (rank[rootX] < rank[rootY]) {
            parent[rootX] = rootY;
            weight[rootX] = w - weight[x] + weight[y];
        } else if (rank[rootX] > rank[rootY]) {
            parent[rootY] = rootX;
            weight[rootY] = -w + weight[x] - weight[y];
        } else {
            parent[rootY] = rootX;
            weight[rootY] = -w + weight[x] - weight[y];
            rank[rootX]++;
        }
        return true;
    }

    /** Returns value(x) / value(y), or -1.0 if not connected. */
    public double query(int x, int y) {
        if (find(x) != find(y)) return -1.0;
        return weight[x] - weight[y];
    }
}

This variant powers the “Evaluate Division” problem (given equations like a/b = 2.0, b/c = 3.0, answer queries like a/c = ?). Path compression accumulates weights along the compressed path, maintaining correct relative values.

Edge Cases and Interviewer Tips

Edge cases to handle:

Unioning an element with itself — find(x) == find(y), so no merge occurs. Your union should handle this gracefully (return false).
Single element — works correctly; it’s its own root with rank 0.
Already-connected elements — union should detect this and skip the merge, which is critical for cycle detection.
Large input — Union-Find handles millions of elements because the amortized cost per operation is O(α(n)) ≈ O(1).

What interviewers want to hear:

Both optimizations are needed for O(α(n)). Path compression alone gives O(log n). Union by rank alone gives O(log n). Together they achieve the tighter bound.
Clearly state that α(n) ≤ 5 for any practical input size. Don’t hand-wave — mention the inverse Ackermann function by name.
Explain why you chose union by rank vs. union by size. Both work. Union by size has the advantage of tracking actual component sizes, which some problems require.
When asked “what’s the space complexity?” — O(n) for the parent and rank arrays.
Know that Union-Find does not support efficient split (disconnecting a set into two). If the problem requires splits, consider a different approach like link-cut trees.

Time complexity summary:

find: O(α(n)) amortized
union: O(α(n)) amortized
connected: O(α(n)) amortized
Space: O(n)

Union-Find is deceptively powerful. Two short arrays and two simple optimizations give you a structure that handles dynamic connectivity faster than almost anything else you could build. Master it, and an entire category of graph problems becomes straightforward.