Skip to main content
cracking the tech interview system design and algorithms in java 25

Tries

13 min read Chapter 37 of 75
Summary

Covers trie node design, insert/search/startsWith operations, space optimization...

Covers trie node design, insert/search/startsWith operations, space optimization with compressed tries, and applications in autocomplete and word search.

Tries

When you need to look up words by prefix, check if a string exists in a dictionary, or build an autocomplete engine, hash maps fall short. A hash map can tell you if “apple” exists, but it cannot efficiently answer “what words start with ‘app’?” — not without scanning every key. Tries solve this exact class of problems by organizing strings character by character into a tree structure where shared prefixes share the same path.

What Is a Trie?

A trie (pronounced “try,” from retrieval) is a tree-shaped data structure where:

  • Each node represents a single character in a string.
  • The path from the root to a node spells out a prefix.
  • Nodes that mark the end of a complete word carry a boolean flag.

For the words “cat”, “car”, “card”, and “care”, the trie looks like:

        (root)
          |
          c
          |
          a
         / \
        t    r
            / \
           d   e

“cat” follows root → c → a → t. “car” follows root → c → a → r. “card” and “care” branch from the ‘r’ node. The prefix “ca” corresponds to the path root → c → a, which exists in the trie — so startsWith("ca") returns true immediately.

Trie Node Design

Two standard approaches for storing children:

Array-based (when the alphabet is fixed and small, e.g., lowercase English letters):

class TrieNode {
    TrieNode[] children = new TrieNode[26];
    boolean isEndOfWord = false;
}

Fast lookups — index directly by ch - 'a'. Wastes memory when most slots stay null, but for dense dictionaries over a small alphabet, the constant-time access is worth it.

Map-based (when the alphabet is large or variable — Unicode, mixed case, digits):

class TrieNode {
    Map<Character, TrieNode> children = new HashMap<>();
    boolean isEndOfWord = false;
}

Uses memory proportional to actual children, not alphabet size. Slightly slower per lookup due to hashing overhead, but handles arbitrary character sets.

Choose array-based for interview problems (they almost always use lowercase letters). Choose map-based for production systems with diverse character sets.

Complete Trie Implementation

public class Trie {
    private final TrieNode root;

    public Trie() {
        root = new TrieNode();
    }

    public void insert(String word) {
        TrieNode current = root;
        for (int i = 0; i < word.length(); i++) {
            int index = word.charAt(i) - 'a';
            if (current.children[index] == null) {
                current.children[index] = new TrieNode();
            }
            current = current.children[index];
        }
        current.isEndOfWord = true;
    }

    public boolean search(String word) {
        TrieNode node = traverse(word);
        return node != null && node.isEndOfWord;
    }

    public boolean startsWith(String prefix) {
        return traverse(prefix) != null;
    }

    private TrieNode traverse(String s) {
        TrieNode current = root;
        for (int i = 0; i < s.length(); i++) {
            int index = s.charAt(i) - 'a';
            if (current.children[index] == null) {
                return null;
            }
            current = current.children[index];
        }
        return current;
    }

    private static class TrieNode {
        TrieNode[] children = new TrieNode[26];
        boolean isEndOfWord = false;
    }
}

The traverse helper is the backbone. Both search and startsWith use it — the only difference is whether they check isEndOfWord. This factoring eliminates duplicate code and makes the implementation clean.

Complexity Analysis

Time complexity: O(L) for insert, search, and startsWith, where L is the length of the word or prefix. Each operation processes exactly one character per level.

Space complexity: O(ALPHABET_SIZE × L × N) in the worst case, where N is the number of words. Every word of length L creates up to L new nodes, each holding an array of size ALPHABET_SIZE (26 for lowercase). For 10,000 words averaging 10 characters, that is up to 100,000 nodes × 26 pointers each — meaningful memory consumption.

This space cost is the primary trade-off of tries. They buy you O(L) prefix operations at the expense of higher memory usage compared to a flat hash map.

Space Optimization: Compressed Tries

The Problem with Standard Tries

When a chain of nodes each has only one child, a standard trie wastes space storing single-child nodes. The word “internationalization” creates 20 nodes even though the first 20 characters form one unbroken chain.

Compressed Trie (Radix Tree)

A compressed trie merges single-child chains into single edges labeled with multiple characters:

Standard trie for "romane", "romanus", "romulus":

    root → r → o → m → a → n → e
                            → u → s
                  → u → l → u → s

Compressed trie:

    root → "rom" → "an" → "e"
                        → "us"
                → "ulus"

Each edge stores a substring instead of a single character. Internal nodes exist only where strings diverge. This dramatically reduces node count for datasets with long shared prefixes.

The trade-off: insertion and splitting logic become more complex. You need to handle edge splitting when a new word diverges mid-edge. For interview problems, a standard trie suffices. For production autocomplete systems processing millions of entries, compressed tries save significant memory.

Map-Based vs Array-Based Trade-offs

AspectArray [26]HashMap
Lookup speedO(1) index arithmeticO(1) amortized hash
Memory per nodeFixed 26 pointersProportional to actual children
Best forDense tries, fixed alphabetSparse tries, Unicode
Cache behaviorBetter localityHash table overhead
Interview recommendationDefault choiceUse when problem involves mixed characters

Classic Interview Problems

Implement Trie (LeetCode 208)

The implementation above solves this directly. Key points interviewers watch for:

  • Clean separation of traverse from search/startsWith.
  • Correct handling of empty strings.
  • Understanding that search checks isEndOfWord but startsWith does not.

Add and Search Word with Wildcard ’.’

Problem: design a data structure that supports addWord(word) and search(word) where search can contain '.' matching any character.

public class WordDictionary {
    private final TrieNode root;

    public WordDictionary() {
        root = new TrieNode();
    }

    public void addWord(String word) {
        TrieNode current = root;
        for (int i = 0; i < word.length(); i++) {
            int index = word.charAt(i) - 'a';
            if (current.children[index] == null) {
                current.children[index] = new TrieNode();
            }
            current = current.children[index];
        }
        current.isEndOfWord = true;
    }

    public boolean search(String word) {
        return dfs(root, word, 0);
    }

    private boolean dfs(TrieNode node, String word, int pos) {
        if (pos == word.length()) {
            return node.isEndOfWord;
        }
        char ch = word.charAt(pos);
        if (ch == '.') {
            for (TrieNode child : node.children) {
                if (child != null && dfs(child, word, pos + 1)) {
                    return true;
                }
            }
            return false;
        }
        int index = ch - 'a';
        if (node.children[index] == null) {
            return false;
        }
        return dfs(node.children[index], word, pos + 1);
    }

    private static class TrieNode {
        TrieNode[] children = new TrieNode[26];
        boolean isEndOfWord = false;
    }
}

The wildcard '.' triggers a branch in the DFS — you explore all 26 children. Without wildcards, the search follows a single path. This means worst-case complexity for a query like ”…” (all dots) is O(26^L), but practical queries with few wildcards stay fast.

Word Search II (Boggle Board)

Problem: given an m × n board of characters and a list of words, find all words that can be constructed from adjacent cells (up, down, left, right). Each cell can be used at most once per word.

Strategy: build a trie from the word list, then run backtracking DFS from every cell using the trie to prune paths early.

public class WordSearchII {
    private final int[][] directions = {{0, 1}, {0, -1}, {1, 0}, {-1, 0}};

    public List<String> findWords(char[][] board, String[] words) {
        TrieNode root = buildTrie(words);
        var result = new ArrayList<String>();
        int rows = board.length;
        int cols = board[0].length;

        for (int r = 0; r < rows; r++) {
            for (int c = 0; c < cols; c++) {
                dfs(board, r, c, root, result);
            }
        }
        return result;
    }

    private void dfs(char[][] board, int r, int c, TrieNode node, List<String> result) {
        if (r < 0 || r >= board.length || c < 0 || c >= board[0].length) {
            return;
        }
        char ch = board[r][c];
        if (ch == '#' || node.children[ch - 'a'] == null) {
            return;
        }

        node = node.children[ch - 'a'];
        if (node.word != null) {
            result.add(node.word);
            node.word = null; // avoid duplicates
        }

        board[r][c] = '#'; // mark visited
        for (int[] dir : directions) {
            dfs(board, r + dir[0], c + dir[1], node, result);
        }
        board[r][c] = ch; // restore
    }

    private TrieNode buildTrie(String[] words) {
        TrieNode root = new TrieNode();
        for (String word : words) {
            TrieNode current = root;
            for (int i = 0; i < word.length(); i++) {
                int index = word.charAt(i) - 'a';
                if (current.children[index] == null) {
                    current.children[index] = new TrieNode();
                }
                current = current.children[index];
            }
            current.word = word;
        }
        return root;
    }

    private static class TrieNode {
        TrieNode[] children = new TrieNode[26];
        String word = null; // stores complete word at end nodes
    }
}

The trie provides two critical advantages over checking each word independently:

  1. Shared prefix pruning: words like “oath” and “oat” share the same path for three characters. You traverse that path once instead of twice.
  2. Early termination: if no word in the list starts with the current prefix, you stop exploring that path immediately.

Without the trie, you would run a separate DFS for each word — O(words × m × n × 4^L). With the trie, you run one DFS per cell, guided by the trie structure, achieving dramatically better practical performance.

Autocomplete System

Problem: design a system where users type characters one at a time, and you return the top 3 most-frequently-typed sentences that match the current prefix.

public class AutocompleteSystem {
    private final TrieNode root;
    private TrieNode currentNode;
    private final StringBuilder currentInput;

    public AutocompleteSystem(String[] sentences, int[] times) {
        root = new TrieNode();
        currentNode = root;
        currentInput = new StringBuilder();

        for (int i = 0; i < sentences.length; i++) {
            insertWithFrequency(sentences[i], times[i]);
        }
    }

    public List<String> input(char c) {
        if (c == '#') {
            insertWithFrequency(currentInput.toString(), 1);
            currentInput.setLength(0);
            currentNode = root;
            return List.of();
        }

        currentInput.append(c);
        if (currentNode != null) {
            int index = c == ' ' ? 26 : c - 'a';
            currentNode = currentNode.children[index];
        }

        if (currentNode == null) {
            return List.of();
        }

        var candidates = new PriorityQueue<Map.Entry<String, Integer>>(
            (a, b) -> a.getValue().equals(b.getValue())
                ? b.getKey().compareTo(a.getKey())
                : a.getValue() - b.getValue()
        );

        collectAll(currentNode, candidates);

        var result = new ArrayList<String>();
        while (!candidates.isEmpty()) {
            result.addFirst(candidates.poll().getKey());
        }
        return result;
    }

    private void collectAll(TrieNode node,
                            PriorityQueue<Map.Entry<String, Integer>> heap) {
        if (node.sentence != null) {
            heap.offer(Map.entry(node.sentence, node.frequency));
            if (heap.size() > 3) {
                heap.poll();
            }
        }
        for (TrieNode child : node.children) {
            if (child != null) {
                collectAll(child, heap);
            }
        }
    }

    private void insertWithFrequency(String sentence, int count) {
        TrieNode current = root;
        for (int i = 0; i < sentence.length(); i++) {
            int index = sentence.charAt(i) == ' ' ? 26 : sentence.charAt(i) - 'a';
            if (current.children[index] == null) {
                current.children[index] = new TrieNode();
            }
            current = current.children[index];
        }
        current.sentence = sentence;
        current.frequency += count;
    }

    private static class TrieNode {
        TrieNode[] children = new TrieNode[27]; // 26 letters + space
        String sentence = null;
        int frequency = 0;
    }
}

The currentNode pointer tracks the user’s position in the trie as they type. Each new character advances the pointer one level deeper. When the user presses ’#’, the completed sentence is recorded with an incremented frequency. This avoids retraversing from the root on every keystroke.

Longest Common Prefix

Strategy: insert all strings into a trie, then walk from the root as long as each node has exactly one child and is not an end-of-word.

public String longestCommonPrefix(String[] strs) {
    if (strs.length == 0) return "";

    var trie = new Trie();
    for (String s : strs) {
        if (s.isEmpty()) return "";
        trie.insert(s);
    }

    var prefix = new StringBuilder();
    TrieNode current = trie.root;

    while (true) {
        int childCount = 0;
        int nextIndex = -1;
        for (int i = 0; i < 26; i++) {
            if (current.children[i] != null) {
                childCount++;
                nextIndex = i;
            }
        }
        if (childCount != 1 || current.isEndOfWord) {
            break;
        }
        prefix.append((char) ('a' + nextIndex));
        current = current.children[nextIndex];
    }

    return prefix.toString();
}

When a node has more than one child, the common prefix ends — strings diverge at that point. When a node is marked isEndOfWord, the shortest string ends there, so the prefix cannot extend further.

Trie vs HashMap for Prefix Operations

OperationTrieHashMap
Exact lookupO(L)O(L) average
Prefix search (“starts with”)O(L)O(N × L) scan all keys
Autocomplete (all words with prefix)O(L + results)O(N × L)
Lexicographic orderingNatural (DFS)Requires sorting
MemoryHigher (node overhead)Lower (flat storage)
Wildcard matchingNatural (DFS branching)Requires regex or custom logic

Use a trie when: prefix queries dominate, you need autocomplete functionality, you need wildcard matching, or you want lexicographic ordering without sorting.

Use a HashMap when: you only need exact lookups, memory is constrained, or the number of strings is small enough that scanning all keys for prefix matches is acceptable.

Practical Applications

  • Autocomplete: search engines, IDE code completion, mobile keyboards.
  • Spell checkers: store a dictionary in a trie, check if typed words exist, suggest corrections by exploring nearby paths.
  • IP routing tables: longest prefix match for routing decisions uses a trie over binary representations of IP addresses.
  • DNA sequence analysis: tries over the alphabet {A, C, G, T} for pattern matching in genomics.
  • Word games: Boggle solvers, Scrabble helpers, crossword generators.

Edge Cases and Interviewer Tips

  • Empty string: inserting or searching for "" should work — the root itself is the end-of-word marker. Make sure your implementation handles this gracefully.
  • Case sensitivity: clarify with the interviewer. If the problem says “lowercase English letters,” use array [26]. If it says “alphanumeric,” you need a larger array or a map.
  • Unicode: for production systems, always use map-based children. Array-based approaches do not scale to Unicode’s ~150,000 characters.
  • Memory pressure: if the interviewer asks about memory optimization, mention compressed tries, and discuss the array-vs-map trade-off. You can also mention that in practice, ternary search tries offer a middle ground between memory and performance.
  • Deletion: deleting from a trie requires care — you unset isEndOfWord and then prune any nodes that are no longer part of other words. Walk back up from the deleted word’s terminal node, removing nodes that have no children and are not end-of-word markers for other strings.
  • Thread safety: standard trie implementations are not thread-safe. For concurrent access, consider a concurrent hash map for children or read-write locks at the node level.

When you see a problem involving string prefixes, dictionary operations, or pattern matching with wildcards, a trie should be your first instinct. State that to your interviewer early — it demonstrates pattern recognition and shows you know which data structure fits which problem shape.