Skip to main content
cracking the tech interview system design and algorithms in java 25

Part IV — Hash-Based Structures

4 min read Chapter 28 of 75
Summary

Overview of hash tables and maps — the...

Overview of hash tables and maps — the most frequently tested data structures in technical interviews, covering hashing fundamentals and collision resolution.

Part IV — Hash-Based Structures

If you could take only one data structure into a coding interview, take a hash map. No other structure appears in as many interview solutions, solves as wide a range of problems, or delivers as dramatic a performance improvement over brute force. The two-pointer technique shaves a pass. Binary search cuts the search space in half. A hash map eliminates the search entirely — O(1) average-case lookups turn O(n²) brute-force scans into linear sweeps.

Interviewers know this. They expect you to reach for a hash map when a problem involves frequency counting, duplicate detection, two-sum lookups, or caching intermediate results. What separates strong candidates from average ones is understanding why hash maps deliver O(1) performance, when that guarantee breaks down, and how the JVM implements these structures under the hood.

Why Hashing Works

A hash function transforms a key of arbitrary size into a fixed-size integer — the hash code. That integer serves as an index into an array of buckets. When the hash function distributes keys uniformly across buckets, each bucket holds at most one or two entries, and lookups complete in constant time.

Three properties define a good hash function:

Deterministic. The same input always produces the same hash code. If "hello".hashCode() returns 99162322 once, it returns 99162322 every time. Without determinism, a key stored under one hash code becomes unretrievable under a different one.

Uniform distribution. Keys spread evenly across the bucket array. A hash function that maps 90% of inputs to the same bucket degrades from O(1) to O(n) — you’ve built an expensive linked list.

Avalanche effect. Small changes in the input produce large changes in the hash code. "cat" and "bat" should land in completely different buckets. Without this property, similar keys cluster together, creating hot spots.

Java’s Object.hashCode() contract adds one more rule: if a.equals(b) is true, then a.hashCode() == b.hashCode() is required. Violating this contract causes silent, devastating bugs — a key inserted into a HashMap becomes invisible to get() because the lookup probes the wrong bucket.

When O(1) Breaks Down — Collisions

No hash function is perfect. With a finite number of buckets and an infinite key space, two different keys will eventually hash to the same index. This is a collision, and how you handle it determines the real-world performance of your hash-based structure.

Two families of collision resolution dominate both interviews and production systems:

Separate Chaining stores all colliding entries in a secondary data structure — typically a linked list — at each bucket. Insertions always succeed (append to the chain), but lookups degrade to O(k) where k is the chain length. Java’s HashMap uses this approach, with a critical optimization: when a chain grows beyond 8 nodes, it converts to a Red-Black tree, bounding worst-case lookup to O(log n).

Open Addressing stores all entries directly in the bucket array. When a collision occurs, the algorithm probes subsequent slots — linearly, quadratically, or via a second hash function — until it finds an empty one. This approach avoids pointer overhead and plays well with CPU caches, but performance degrades sharply as the load factor approaches 1.0. Python’s dict uses open addressing with a custom perturbation scheme.

Load Factor — The Performance Knob

The load factor α = n/m (number of entries divided by number of buckets) is the single most important metric for hash table performance. At α = 0.5, half the buckets are occupied and collisions are infrequent. At α = 0.9, nearly every insertion triggers a collision chain. Java’s HashMap defaults to a maximum load factor of 0.75 — a well-tested balance between memory efficiency and lookup speed. When the load factor exceeds this threshold, the table doubles in size and rehashes every entry.

Understanding load factor lets you answer interviewer follow-ups like: “What happens if we insert a billion elements?” or “How would you tune this for memory-constrained environments?”

What Lies Ahead

The two chapters that follow dissect hash-based structures from the inside out:

  1. Hash Tables — Build a hash table from scratch using separate chaining and open addressing. Analyze load factor impact, implement dynamic resizing, and explore Java’s HashMap internals including bucket treeification. Solve classic interview problems: Two Sum, Group Anagrams, and First Non-Repeating Character.

  2. Maps — Master the Map abstraction across its implementations: HashMap, LinkedHashMap, TreeMap, and ConcurrentHashMap. Build an LRU Cache, implement interval scheduling with TreeMap, and write a concurrent word frequency counter with virtual threads.

Each chapter provides complete Java 25 implementations, complexity analysis, and interviewer tips. You will walk into your next interview knowing not only what to use, but why it works and what breaks when things go wrong.