Skip to main content
cracking the tech interview system design and algorithms in java 25

Radix Sort

11 min read Chapter 54 of 75
Summary

Covers LSD (least significant digit) radix sort using...

Covers LSD (least significant digit) radix sort using counting sort as a stable subroutine, achieving O(d(n + k)) time, and comparison with other sorting approaches.

Radix Sort

Counting sort handles integers in a small range. But what about sorting integers up to 1,000,000 or even 2^32? You can’t allocate a count array of size 2^32 and call it a day. Radix sort solves this by decomposing large numbers into individual digits and sorting one digit at a time.

The result: linear-time sorting for fixed-width integers, with no comparison between elements.

The Core Idea

Instead of sorting by the entire value, radix sort sorts by one digit position per pass. After processing all digits, the array is fully sorted.

There are two variants:

  • LSD (Least Significant Digit): process digits from right to left (ones → tens → hundreds). This is the standard version.
  • MSD (Most Significant Digit): process digits from left to right. Better for strings. More complex to implement.

LSD radix sort is the workhorse. Let’s master it first.

Why Least Significant Digit First?

This feels backwards — shouldn’t the most significant digit matter most? The trick is that each digit-level sort must be stable. Stability means equal elements retain their relative order from the previous pass.

Here’s why this works:

  1. After sorting by the ones digit, elements with the same ones digit are grouped together.
  2. When sorting by the tens digit, stability ensures that within each tens-digit group, the ones-digit ordering from the previous pass is preserved.
  3. Each subsequent pass refines the ordering without destroying the work of previous passes.

If the subroutine were unstable, sorting by the tens digit would scramble the ones-digit ordering, and the final result would be wrong.

The Algorithm

  1. Find the maximum number to determine the number of digits d.
  2. For each digit position (ones, tens, hundreds, …):
    • Use counting sort on that digit — base 10 means k = 10.
  3. After d passes, the array is sorted.

Java 25 Implementation

void radixSort(int[] arr) {
    int n = arr.length;
    if (n <= 1) return;

    // Find the maximum to determine number of digits
    int max = arr[0];
    for (int i = 1; i < n; i++) {
        max = Math.max(max, arr[i]);
    }

    // Sort by each digit position: exp = 1 (ones), 10 (tens), 100 (hundreds), ...
    for (int exp = 1; max / exp > 0; exp *= 10) {
        countingSortByDigit(arr, n, exp);
    }
}

void countingSortByDigit(int[] arr, int n, int exp) {
    int[] output = new int[n];
    int[] count = new int[10];  // Digits 0–9

    // Count occurrences of each digit at this position
    for (int i = 0; i < n; i++) {
        int digit = (arr[i] / exp) % 10;
        count[digit]++;
    }

    // Prefix sums
    for (int i = 1; i < 10; i++) {
        count[i] += count[i - 1];
    }

    // Build output array (backwards for stability)
    for (int i = n - 1; i >= 0; i--) {
        int digit = (arr[i] / exp) % 10;
        output[count[digit] - 1] = arr[i];
        count[digit]--;
    }

    // Copy output back to arr
    System.arraycopy(output, 0, arr, 0, n);
}

The expression (arr[i] / exp) % 10 extracts the digit at position exp:

  • exp = 1: ones digit.
  • exp = 10: tens digit (divide by 10, take mod 10).
  • exp = 100: hundreds digit, and so on.

Visual Walkthrough

Input: [170, 45, 75, 90, 802, 24, 2, 66]

Maximum value: 802 → 3 digits → 3 passes.

Pass 1 — Sort by Ones Digit

Extract ones: [0, 5, 5, 0, 2, 4, 2, 6]

After stable counting sort by ones digit:

[170, 90, 802, 2, 24, 45, 75, 66]
  0    0    2   2   4   5   5   6

Elements with ones digit 0 come first (170, 90 in original order — stability), then digit 2 elements (802, 2), etc.

Pass 2 — Sort by Tens Digit

Extract tens: [7, 9, 0, 0, 2, 4, 7, 6]

After stable counting sort by tens digit:

[802, 2, 24, 45, 66, 170, 75, 90]
  0   0   2   4   6    7   7   9

802 and 2 both have tens digit 0. Stability preserves their order from Pass 1 (802 came before 2). Similarly, 170 and 75 both have tens digit 7 — stability keeps 170 before 75.

Pass 3 — Sort by Hundreds Digit

Extract hundreds: [8, 0, 0, 0, 0, 1, 0, 0]

After stable counting sort by hundreds digit:

[2, 24, 45, 66, 75, 90, 170, 802]
 0   0   0   0   0   0    1    8

All elements with hundreds digit 0 appear first, in the order established by Pass 2. The array is now fully sorted.

Analysis

Time Complexity: O(d × (n + k))

Each pass runs counting sort in O(n + k) time, where k is the base (number of possible digit values). With d digit positions:

$$T(n) = O(d \times (n + k))$$

For base 10 with d-digit numbers: O(d × (n + 10)) = O(d × n).

For 32-bit integers using base 256 (byte-level processing):

  • k = 256 (byte values 0–255)
  • d = 4 (a 32-bit integer has 4 bytes)
  • Time: O(4 × (n + 256)) = O(n)

That’s constant-time per element — radix sort provides true linear-time sorting for fixed-width integers.

Formal Comparison with Comparison Sorts

For n integers with max value M:

  • d = ⌈log_k(M)⌉ digits in base k
  • Radix sort: O(d × (n + k)) = O(n × log_k(M) + k × log_k(M))

Choosing k = n: O(n × log_n(M)). When M = n^c (polynomial in n), this becomes O(c × n) = O(n).

Compare with O(n log n) for comparison sorts. Radix sort wins when d is small — many elements with few digits.

Space Complexity: O(n + k)

The counting sort subroutine requires an output array of size n and a count array of size k. These are reused across passes, so total auxiliary space is O(n + k).

Stability: Yes

Radix sort inherits stability from counting sort. It’s crucial that this property holds, because the correctness of LSD radix sort depends on each pass preserving the ordering established by earlier passes.

Base Selection: A Critical Trade-Off

The choice of base k controls the trade-off between number of passes and counting array size:

BaseDigits (32-bit int)Count Array SizePasses
10101010
168168
25642564
65,536265,5362

Base 256 (byte-level) is the sweet spot for 32-bit integers:

  • 4 passes is few enough to be fast.
  • Count array of 256 fits comfortably in L1 cache.
  • Digit extraction uses bitwise operations: (arr[i] >> (8 * pass)) & 0xFF.

Base 65,536 (word-level) cuts passes to 2 but needs a 65K count array. Still practical on modern machines.

Base 10 is the worst choice for performance — 10 passes for a 32-bit integer — but it’s the clearest for teaching and interviews.

Byte-level radix sort in Java:

void radixSort256(int[] arr) {
    int n = arr.length;
    int[] output = new int[n];

    for (int shift = 0; shift < 32; shift += 8) {
        int[] count = new int[256];

        // Count
        for (int val : arr) {
            int digit = (val >> shift) & 0xFF;
            count[digit]++;
        }

        // Prefix sums
        for (int i = 1; i < 256; i++) {
            count[i] += count[i - 1];
        }

        // Place (backwards for stability)
        for (int i = n - 1; i >= 0; i--) {
            int digit = (arr[i] >> shift) & 0xFF;
            output[count[digit] - 1] = arr[i];
            count[digit]--;
        }

        System.arraycopy(output, 0, arr, 0, n);
    }
}

The shift-and-mask approach (val >> shift) & 0xFF is faster than division and modulo, and it naturally processes one byte per pass.

MSD (Most Significant Digit) Radix Sort

MSD radix sort processes digits from left to right and sorts recursively within each bucket:

  1. Sort all elements by the most significant digit into buckets.
  2. Recursively sort each bucket by the next digit.
  3. Concatenate buckets.
void msdRadixSort(int[] arr, int lo, int hi, int exp) {
    if (lo >= hi || exp <= 0) return;

    int[] count = new int[12];  // 10 digits + 2 for boundaries
    int[] output = new int[hi - lo + 1];

    // Count occurrences
    for (int i = lo; i <= hi; i++) {
        int digit = (arr[i] / exp) % 10;
        count[digit + 2]++;
    }

    // Prefix sums
    for (int i = 2; i < 12; i++) {
        count[i] += count[i - 1];
    }

    // Distribute
    for (int i = lo; i <= hi; i++) {
        int digit = (arr[i] / exp) % 10;
        output[count[digit + 1]++] = arr[i];
    }

    System.arraycopy(output, 0, arr, lo, hi - lo + 1);

    // Recursively sort each bucket
    for (int d = 0; d < 10; d++) {
        msdRadixSort(arr, lo + count[d], lo + count[d + 1] - 1, exp / 10);
    }
}

MSD advantages:

  • Short-circuits: single-element buckets don’t need further sorting.
  • Better for strings: lexicographic ordering naturally follows left-to-right digit processing.
  • Can stop early if all remaining elements in a bucket share the same prefix.

MSD disadvantages:

  • More complex implementation (recursive, bucket management).
  • Poor cache performance due to recursive bucket subdivision.
  • LSD is simpler and faster for fixed-width integers.

Handling Negative Numbers

Standard radix sort treats numbers as unsigned. To handle negatives:

void radixSortWithNegatives(int[] arr) {
    // Separate negatives and positives
    int negCount = 0;
    for (int val : arr) {
        if (val < 0) negCount++;
    }

    int[] negatives = new int[negCount];
    int[] positives = new int[arr.length - negCount];
    int ni = 0, pi = 0;

    for (int val : arr) {
        if (val < 0) negatives[ni++] = -val;    // Store absolute values
        else positives[pi++] = val;
    }

    // Sort both halves
    if (negatives.length > 0) radixSort(negatives);
    if (positives.length > 0) radixSort(positives);

    // Merge: reversed negatives, then positives
    int idx = 0;
    for (int i = negatives.length - 1; i >= 0; i--) {
        arr[idx++] = -negatives[i];     // Reverse and negate back
    }
    for (int val : positives) {
        arr[idx++] = val;
    }
}

The negatives, when sorted by absolute value, come out in ascending order of magnitude. Reversing them and negating produces descending negative values, which is the correct sorted order.

When Radix Sort Beats Comparison Sorts

Radix sort outperforms comparison sorts in specific scenarios:

Large n, small d: Sorting 10 million 32-bit integers? Radix sort with base 256 does 4 passes of O(n) work = O(4n). Quicksort does O(n log n) ≈ O(23n). Radix sort wins by ~6×.

Fixed-width integers: 32-bit or 64-bit integers have a bounded number of digits. The d factor becomes a constant, making radix sort genuinely O(n).

Predictable performance: No worst-case blowup like quicksort. No data dependency. Every input takes the same amount of time. This predictability matters in systems with real-time constraints.

When it doesn’t win: If n is small (fewer than ~1000 elements), the constant factors in radix sort (multiple passes, counting array setup) outweigh the O(n log n) cost of comparison sorts. Quicksort with its cache-friendly access pattern is faster for small to medium arrays.

Also, radix sort works poorly when values span an extreme range with few elements. Sorting 100 elements where the maximum is 10^18 requires d = 19 passes in base 10 — far worse than quicksort’s O(100 × 7) ≈ 700 comparisons.

Radix Sort in Practice

Java’s standard library doesn’t include radix sort. But it appears in:

  • Database engines: sorting integer columns with known bounds.
  • Graphics pipelines: sorting depth values for rendering order (fixed-point 24-bit depth → 3 byte-level passes).
  • Suffix arrays: string processing algorithms use MSD radix sort for lexicographic ordering.
  • Network packet processing: sorting packets by integer fields (IP addresses, port numbers).

Interviewer Tips

  • Lead with the insight: radix sort exploits the structure of integers (fixed number of digits) to avoid comparisons entirely. Frame it as “sorting by position, not by comparison.”
  • Explain the stability requirement: when asked “why LSD?” answer that each pass must be stable so that digit-level orderings compose correctly. This is the core correctness argument.
  • Know the complexity precisely: state O(d × (n + k)), then specialize. For 32-bit integers with base 256: O(4 × (n + 256)) = O(n). Interviewers want to see you manipulate the parameters.
  • Draw the walkthrough: the three-pass example with [170, 45, 75, 90, 802, 24, 2, 66] is compact enough to trace on a whiteboard. Showing each pass builds confidence that you understand the mechanics.
  • Compare honestly: radix sort isn’t always faster. When asked “is radix sort always better than quicksort?” answer no — it depends on n, d, k, and cache effects. This nuance demonstrates real understanding beyond textbook recitation.
  • Mention the base trade-off: bring up base 256 vs. base 10 without being prompted. It shows you’ve thought about practical performance, not only asymptotic behavior.