Skip to main content
cracking the tech interview system design and algorithms in java 25

Hash Tables

15 min read Chapter 29 of 75
Summary

Implements a hash table with separate chaining and...

Implements a hash table with separate chaining and open addressing (linear probing, quadratic probing, double hashing), analyzes load factor impact, and covers Java's HashMap internals including treeification.

Hash Tables

Building a hash table from scratch is one of the most revealing exercises in computer science. It forces you to confront every design decision that production implementations hide behind a clean API: how to compute bucket indices, how to handle collisions, when to resize, and how to maintain O(1) amortized performance as the table grows. Interviewers who ask you to “implement a HashMap” are testing this depth.

This chapter builds two complete hash table implementations in Java 25 — one using separate chaining, one using open addressing — then peels back the lid on java.util.HashMap to reveal the engineering decisions that make it fast in practice.

Hash Function Design

The hash function is the foundation. A poor hash function turns an O(1) data structure into an O(n) one. Three hashing strategies appear frequently in interviews and systems:

Modular Hashing

The most intuitive approach: compute key.hashCode() % capacity. The result is a valid bucket index. The problem? If the capacity is not prime, patterns in hash codes produce systematic collisions. Keys with hash codes that are multiples of the capacity all land in bucket 0.

int index = Math.floorMod(key.hashCode(), capacity);

Using Math.floorMod instead of % avoids negative indices — a trap that catches many candidates. The % operator in Java preserves the sign of the dividend, so (-7) % 4 returns -3, not 1.

Multiplicative Hashing

Multiply the hash code by an irrational constant (Knuth recommends the golden ratio φ = 0.6180339887…), take the fractional part, and scale to the table size:

int index = (int) (capacity * ((key.hashCode() * 0.6180339887) % 1.0));

This distributes keys more uniformly than modular hashing because it avoids the sensitivity to table size.

Java’s hashCode() Contract

Every class that overrides equals() must override hashCode(). The contract:

  1. If a.equals(b), then a.hashCode() == b.hashCode().
  2. If !a.equals(b), their hash codes should differ (but are not required to).
  3. hashCode() must return the same value across invocations within a single JVM run (assuming no mutable fields change).

Violating rule 1 is catastrophic. Insert a key, mutate a field used in hashCode(), and the key becomes permanently lost in the table — get() probes the wrong bucket, finds nothing, and returns null. Interviewers love this question.

Separate Chaining Implementation

Separate chaining stores collisions in a linked list at each bucket. The implementation is straightforward and forgiving — it handles high load factors gracefully because chains grow without limit.

public class ChainingHashTable<K, V> {

    record Entry<K, V>(K key, V value, Entry<K, V> next) {
        Entry<K, V> withValue(V newValue) {
            return new Entry<>(key, newValue, next);
        }

        Entry<K, V> withNext(Entry<K, V> newNext) {
            return new Entry<>(key, value, newNext);
        }
    }

    private Entry<K, V>[] buckets;
    private int size;
    private static final double MAX_LOAD_FACTOR = 0.75;

    @SuppressWarnings("unchecked")
    public ChainingHashTable(int capacity) {
        this.buckets = new Entry[capacity];
        this.size = 0;
    }

    public ChainingHashTable() {
        this(16);
    }

    private int bucketIndex(K key) {
        return Math.floorMod(key.hashCode(), buckets.length);
    }

    public void put(K key, V value) {
        if ((double) size / buckets.length >= MAX_LOAD_FACTOR) {
            resize();
        }

        int index = bucketIndex(key);
        Entry<K, V> current = buckets[index];

        // Update existing key
        Entry<K, V> prev = null;
        Entry<K, V> node = current;
        while (node != null) {
            if (node.key().equals(key)) {
                // Replace value — rebuild the chain with the updated entry
                Entry<K, V> updated = node.withValue(value);
                if (prev == null) {
                    buckets[index] = updated.withNext(node.next());
                } else {
                    // Rebuild chain up to this point
                    buckets[index] = rebuildChain(current, node, updated);
                }
                return;
            }
            prev = node;
            node = node.next();
        }

        // Insert new key at head of chain
        buckets[index] = new Entry<>(key, value, current);
        size++;
    }

    private Entry<K, V> rebuildChain(
            Entry<K, V> head, Entry<K, V> target, Entry<K, V> replacement) {
        if (head == target) {
            return replacement.withNext(target.next());
        }
        return head.withNext(rebuildChain(head.next(), target, replacement));
    }

    public V get(K key) {
        int index = bucketIndex(key);
        Entry<K, V> current = buckets[index];
        while (current != null) {
            if (current.key().equals(key)) {
                return current.value();
            }
            current = current.next();
        }
        return null;
    }

    public boolean remove(K key) {
        int index = bucketIndex(key);
        Entry<K, V> current = buckets[index];
        Entry<K, V> prev = null;

        while (current != null) {
            if (current.key().equals(key)) {
                if (prev == null) {
                    buckets[index] = current.next();
                } else {
                    buckets[index] = rebuildChain(
                        buckets[index], current, current.next());
                }
                size--;
                return true;
            }
            prev = current;
            current = current.next();
        }
        return false;
    }

    @SuppressWarnings("unchecked")
    private void resize() {
        Entry<K, V>[] oldBuckets = buckets;
        buckets = new Entry[oldBuckets.length * 2];
        size = 0;

        for (Entry<K, V> head : oldBuckets) {
            Entry<K, V> current = head;
            while (current != null) {
                put(current.key(), current.value());
                current = current.next();
            }
        }
    }

    public int size() {
        return size;
    }
}

The Entry record uses Java 25’s record feature for a compact, immutable linked list node. Each entry holds a key, a value, and a reference to the next entry in the chain. The withValue and withNext methods create updated copies without mutating the original — a pattern that makes reasoning about the chain’s state predictable.

Why the Head Insertion Matters

New entries insert at the head of the chain (new Entry<>(key, value, current)). This runs in O(1) regardless of chain length. Tail insertion would require traversing the entire chain — O(k) for a chain of length k — with no benefit.

Open Addressing: Linear Probing

Open addressing stores all entries directly in the bucket array, eliminating pointer overhead. When a collision occurs, the algorithm probes subsequent slots until it finds an empty one.

The Clustering Problem

Linear probing checks index, index+1, index+2, … until it finds an empty slot. This creates primary clustering: occupied slots form contiguous runs. Each new insertion that lands near a cluster extends it, making future collisions more likely. At high load factors, a single cluster can span half the table.

Probe Sequence

For a key with hash h and table size m:

  • Linear probing: probe(i) = (h + i) mod m
  • Quadratic probing: probe(i) = (h + i² ) mod m — breaks up primary clusters but creates secondary clustering (keys with the same hash follow identical probe sequences)
  • Double hashing: probe(i) = (h₁ + i × h₂) mod m — uses a second hash function to determine the step size, producing the most uniform distribution of all three

Linear Probing Implementation

public class LinearProbingHashTable<K, V> {

    record Slot<K, V>(K key, V value) {}

    private static final Object TOMBSTONE = new Object();
    private Object[] table;  // Slot<K,V> or TOMBSTONE or null
    private int size;
    private int capacity;
    private static final double MAX_LOAD_FACTOR = 0.5;

    public LinearProbingHashTable(int capacity) {
        this.capacity = capacity;
        this.table = new Object[capacity];
        this.size = 0;
    }

    public LinearProbingHashTable() {
        this(16);
    }

    private int hash(K key) {
        return Math.floorMod(key.hashCode(), capacity);
    }

    public void put(K key, V value) {
        if ((double) size / capacity >= MAX_LOAD_FACTOR) {
            resize();
        }

        int index = hash(key);
        int firstTombstone = -1;

        for (int i = 0; i < capacity; i++) {
            int probe = (index + i) % capacity;
            Object slot = table[probe];

            if (slot == null) {
                // Empty slot — insert here (or at earlier tombstone)
                int insertAt = firstTombstone != -1 ? firstTombstone : probe;
                table[insertAt] = new Slot<>(key, value);
                size++;
                return;
            }

            if (slot == TOMBSTONE) {
                if (firstTombstone == -1) firstTombstone = probe;
                continue;
            }

            @SuppressWarnings("unchecked")
            Slot<K, V> existing = (Slot<K, V>) slot;
            if (existing.key().equals(key)) {
                table[probe] = new Slot<>(key, value);
                return;  // Update, don't increment size
            }
        }
    }

    @SuppressWarnings("unchecked")
    public V get(K key) {
        int index = hash(key);

        for (int i = 0; i < capacity; i++) {
            int probe = (index + i) % capacity;
            Object slot = table[probe];

            if (slot == null) return null;
            if (slot == TOMBSTONE) continue;

            Slot<K, V> entry = (Slot<K, V>) slot;
            if (entry.key().equals(key)) return entry.value();
        }
        return null;
    }

    public boolean remove(K key) {
        int index = hash(key);

        for (int i = 0; i < capacity; i++) {
            int probe = (index + i) % capacity;
            Object slot = table[probe];

            if (slot == null) return false;
            if (slot == TOMBSTONE) continue;

            @SuppressWarnings("unchecked")
            Slot<K, V> entry = (Slot<K, V>) slot;
            if (entry.key().equals(key)) {
                table[probe] = TOMBSTONE;
                size--;
                return true;
            }
        }
        return false;
    }

    private void resize() {
        Object[] oldTable = table;
        capacity *= 2;
        table = new Object[capacity];
        size = 0;

        for (Object slot : oldTable) {
            if (slot != null && slot != TOMBSTONE) {
                @SuppressWarnings("unchecked")
                Slot<K, V> entry = (Slot<K, V>) slot;
                put(entry.key(), entry.value());
            }
        }
    }

    public int size() {
        return size;
    }
}

The Tombstone Pattern

Deletion in open addressing is tricky. You cannot set a slot to null because that breaks the probe sequence — a get() for a key that was placed after the deleted slot would stop probing too early, thinking no more entries exist. Instead, mark deleted slots with a sentinel value (TOMBSTONE). During lookup, skip tombstones and keep probing. During insertion, reuse tombstone slots to reclaim space.

The trade-off: tombstones accumulate. After many insertions and deletions, the table fills with tombstones that slow every probe sequence. Periodic full rehashing — rebuilding the table from scratch — clears them out.

Why 0.5 Load Factor?

Open addressing uses a stricter load factor (0.5) than separate chaining (0.75). At α = 0.5, the expected number of probes for a successful search is about 1.5. At α = 0.75, it jumps to 2.5. At α = 0.9, it reaches 5.5 probes on average, and performance craters.

Load Factor and Dynamic Resizing

The load factor α = n/m determines the expected number of collisions. For separate chaining, the average chain length equals α. For linear probing, the expected number of probes for an unsuccessful search is approximately 1/(1 − α)².

Load Factor (α)Chaining: avg chain lengthLinear Probing: avg probes (miss)
0.250.251.8
0.500.502.5
0.750.758.5
0.900.9050.5

When α exceeds the threshold, the table must resize. The standard approach:

  1. Allocate a new array of double the current capacity.
  2. Rehash every entry into the new array (hash codes remain the same, but bucket indices change because the modulus changes).
  3. Replace the old array.

This single resize costs O(n), but it happens infrequently. Amortized over all insertions, the cost per insertion is O(1). The proof: after a resize to capacity 2m, you need m more insertions before the next resize. Spreading the O(m) resize cost over m insertions gives O(1) per insertion.

Java HashMap Deep Dive

java.util.HashMap is the most frequently used data structure in Java codebases, and understanding its internals gives you an edge in interviews.

Structure

  • Bucket array: An array of Node<K,V> references, initially length 16.
  • Load factor: 0.75 (configurable).
  • Threshold: capacity × load factor. When size > threshold, the table doubles.

Hash Perturbation

Java does not use hashCode() % capacity directly. It applies a perturbation function:

static int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

This XORs the upper 16 bits of the hash code with the lower 16 bits. Why? HashMap uses (capacity - 1) & hash to compute the bucket index (capacity is always a power of 2, so this is equivalent to hash % capacity but faster). With small capacities like 16, only the bottom 4 bits of the hash determine the bucket. The perturbation mixes information from the upper bits into the lower bits, dramatically reducing collisions for hash codes that differ only in their upper bits.

Treeification

Here is the optimization that separates HashMap from a textbook implementation. When a bucket’s chain grows to 8 or more nodes, HashMap converts the chain into a Red-Black tree. This bounds the worst-case lookup from O(n) to O(log n).

Why 8? The probability of a chain reaching 8 nodes under random hashing follows a Poisson distribution with λ = 0.5 (at load factor 0.75). A chain of 8 occurs with probability 0.00000006 — one in 16 million. Treeification is a safety net for pathological inputs, not normal operation.

When the tree shrinks to 6 or fewer nodes (during removal), it converts back to a linked list. The gap between 8 and 6 prevents oscillation — a chain of 7 does not repeatedly convert back and forth.

Null Key Handling

HashMap allows one null key (stored at index 0) and any number of null values. TreeMap does not allow null keys because it needs to compare them. This is a common interview trivia question.

Classic Interview Problems

Two Sum

Problem: Given an array of integers and a target, return indices of two numbers that add up to the target.

This is the quintessential hash map problem. The brute force checks all pairs in O(n²). A hash map stores each number’s index and checks for the complement in O(1).

int[] twoSum(int[] nums, int target) {
    var seen = new java.util.HashMap<Integer, Integer>();

    for (int i = 0; i < nums.length; i++) {
        int complement = target - nums[i];
        if (seen.containsKey(complement)) {
            return new int[]{seen.get(complement), i};
        }
        seen.put(nums[i], i);
    }
    throw new IllegalArgumentException("No two sum solution");
}

Key insight: You do not need two passes. By checking the complement before inserting the current number, you handle duplicates correctly and use a single pass.

Complexity: O(n) time, O(n) space.

Group Anagrams

Problem: Given an array of strings, group the anagrams together.

Two strings are anagrams if they have the same character frequencies. The hash map key is a canonical representation of those frequencies.

List<List<String>> groupAnagrams(String[] strs) {
    var groups = new java.util.HashMap<String, List<String>>();

    for (String s : strs) {
        char[] sorted = s.toCharArray();
        java.util.Arrays.sort(sorted);
        String key = new String(sorted);

        groups.computeIfAbsent(key, _ -> new java.util.ArrayList<>()).add(s);
    }

    return new java.util.ArrayList<>(groups.values());
}

Alternative key: Instead of sorting (O(k log k) per string), use a frequency count array converted to a string: "#1#0#0#...#0" for 26 letters — O(k) per string. For short strings, sorting is faster in practice due to lower constant factors.

Complexity: O(n × k log k) with sorting, O(n × k) with frequency counting, where k is the maximum string length.

First Non-Repeating Character

Problem: Given a string, find the first character that does not repeat.

int firstUniqChar(String s) {
    var count = new java.util.LinkedHashMap<Character, Integer>();

    for (char c : s.toCharArray()) {
        count.merge(c, 1, Integer::sum);
    }

    for (var entry : count.entrySet()) {
        if (entry.getValue() == 1) {
            return s.indexOf(entry.getKey());
        }
    }
    return -1;
}

Using LinkedHashMap preserves insertion order, so the first entry with count 1 is the answer. The merge method with Integer::sum increments the count — cleaner than getOrDefault followed by put.

Complexity: O(n) time, O(1) space (the map holds at most 26 lowercase English letters — constant).

Complexity Comparison

OperationSeparate Chaining (avg)Separate Chaining (worst)Linear Probing (avg)Linear Probing (worst)Double Hashing (avg)Double Hashing (worst)
InsertO(1)O(n)O(1)O(n)O(1)O(n)
SearchO(1)O(n)O(1)O(n)O(1)O(n)
DeleteO(1)O(n)O(1)O(n)O(1)O(n)
SpaceO(n + m)O(m)O(m)

Worst-case O(n) occurs when all keys hash to the same bucket. Java’s HashMap addresses this with treeification, reducing worst-case search to O(log n).

Edge Cases and Interviewer Tips

  • Null keys: HashMap allows one null key. If the interviewer’s test case includes null, handle it explicitly.
  • Mutable keys: Never use a mutable object as a hash map key. If you modify a field used in hashCode() after insertion, the entry becomes unreachable. Interviewers test this understanding.
  • Integer overflow in hashCode(): Overflow is fine — Java’s int wraps around silently, and the result is still a valid (if potentially negative) integer. Use Math.floorMod to handle negative values.
  • Custom objects as keys: Always override both equals() and hashCode(). Overriding only one violates the contract and causes bugs that are extraordinarily hard to track down.
  • Resizing cost: If you know the number of elements upfront, pass it as the initial capacity to HashMap to avoid resizing: new HashMap<>(expectedSize * 4 / 3 + 1). This accounts for the 0.75 load factor.
  • Thread safety: HashMap is not thread-safe. If the interviewer asks about concurrent access, pivot to ConcurrentHashMap (covered in the next chapter).
  • When the interviewer says “optimize”: After solving with a HashMap, discuss the space cost. For problems with bounded input (like lowercase English letters), an array of size 26 replaces the HashMap with zero overhead.