Skip to main content
fast by design

Choosing the Right Collection: Access Patterns, Memory Layout, and the Benchmark That Settles the Argument

11 min read Chapter 25 of 90

Choosing the Right Collection: Access Patterns, Memory Layout, and the Benchmark That Settles the Argument

The ArrayList vs LinkedList debate is not a debate. It is a benchmark. And the benchmark ended the argument years ago, but developers keep choosing LinkedList because an algorithm textbook told them that linked list insertion is O(1). It is O(1) in terms of pointer manipulation. It is catastrophic in terms of cache misses, memory overhead, and real-world wall clock time.

The content platform manages ordered lists of articles everywhere. The homepage feed. Search results. User reading history. Category listings. Recommendation ranked lists. Every one of these is a List<Article> or a Deque<Article> in memory. Choosing the wrong collection implementation costs microseconds per operation, and when operations number in the millions per second, those microseconds become the bottleneck.

This chapter benchmarks every common List and Deque implementation in the JDK, measures their actual memory consumption per element, explains why memory layout determines performance more than algorithmic complexity, and applies the results to the content platform’s most critical data paths.

Why Big-O Notation Lies to You

Big-O describes growth rate. It does not describe performance. ArrayList’s add(int index, E element) at position 0 is O(n) because it shifts every element. LinkedList’s add(int index, E element) at position 0 is O(1) because it rewires two pointers. On paper, LinkedList wins for front insertion.

In practice, ArrayList’s System.arraycopy on a contiguous memory block runs at memory bandwidth speed. The CPU prefetcher sees a sequential access pattern and loads cache lines before they are needed. The entire shift operation runs from L1/L2 cache for lists under ~100k elements.

LinkedList’s O(1) pointer operation first requires finding the node. The add(int index, E element) method traverses from the head or tail to reach position index. Each traversal step dereferences a pointer to a heap-allocated Node object located at an unpredictable address. Each dereference is a potential L2 or L3 cache miss, costing 5-15ns on modern hardware. For a list of 10,000 elements, inserting at position 5,000 requires 5,000 pointer chases.

The real cost model is:

ArrayList.add(index, element):  O(n) shifts at ~0.5ns per element (cache-friendly)
LinkedList.add(index, element): O(index) traversals at ~5-15ns per hop (cache-hostile)

For ArrayList to lose on a middle insertion, the list needs to be large enough that the shift cost (which scales with n) exceeds the traversal cost (which scales with index, at 10-30x the per-element cost). The crossover point is higher than most developers expect.

Collection memory layout: ArrayList vs LinkedList

The diagram shows the fundamental difference. ArrayList stores element references in a contiguous array. The CPU prefetcher loads adjacent cache lines ahead of the access pattern, so iterating an ArrayList of 10,000 elements involves near-zero cache misses on the backing array itself. LinkedList stores each element in a Node object containing the element reference, a next pointer, and a prev pointer. These nodes are scattered across the heap. Each .next dereference is a random memory access that defeats the prefetcher.

The JMH Benchmark That Settles It

@BenchmarkMode(Mode.AverageTime)
@Warmup(iterations = 3, time = 2)
@Measurement(iterations = 5, time = 5)
@Fork(2)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@State(Scope.Benchmark)
public class ListAccessPatternBenchmark {

    @Param({"100", "10000", "1000000"})
    private int size;

    private List<String> arrayList;
    private List<String> linkedList;

    @Setup
    public void setup() {
        arrayList = new ArrayList<>(size);
        linkedList = new LinkedList<>();
        for (int i = 0; i < size; i++) {
            String element = "article-" + i;
            arrayList.add(element);
            linkedList.add(element);
        }
    }

    // --- Iteration ---

    @Benchmark
    public int arrayListIteration() {
        int count = 0;
        for (String s : arrayList) {
            count += s.length();
        }
        return count;
    }

    @Benchmark
    public int linkedListIteration() {
        int count = 0;
        for (String s : linkedList) {
            count += s.length();
        }
        return count;
    }

    // --- Random access ---

    @Benchmark
    public String arrayListRandomAccess() {
        return arrayList.get(ThreadLocalRandom.current().nextInt(size));
    }

    @Benchmark
    public String linkedListRandomAccess() {
        return linkedList.get(ThreadLocalRandom.current().nextInt(size));
    }

    // --- Middle insertion ---

    @Benchmark
    public void arrayListMiddleInsert(Blackhole bh) {
        List<String> list = new ArrayList<>(arrayList);
        list.add(size / 2, "inserted");
        bh.consume(list);
    }

    @Benchmark
    public void linkedListMiddleInsert(Blackhole bh) {
        List<String> list = new LinkedList<>(linkedList);
        list.add(size / 2, "inserted");
        bh.consume(list);
    }
}

Results on a 3.4 GHz machine with 32KB L1d cache, 256KB L2, 12MB L3:

OperationSizeArrayList (ns)LinkedList (ns)Ratio
Iteration1001804202.3x slower
Iteration10,00014,50089,0006.1x slower
Iteration1,000,0001,450,00018,200,00012.5x slower
Random access100521042x slower
Random access10,000518,4003,680x slower
Random access1,000,00051,840,000368,000x slower
Middle insert100953403.6x slower
Middle insert10,0004,80035,2007.3x slower
Middle insert1,000,000580,0006,100,00010.5x slower

Key observations:

Iteration: ArrayList’s advantage grows with size. At 100 elements, the entire backing array fits in two cache lines. At 1M elements, ArrayList streams through memory sequentially while LinkedList pointer-chases across the heap. The 12.5x gap at 1M is the prefetcher doing its job for ArrayList and being useless for LinkedList.

Random access: ArrayList’s get(index) is a single array index operation regardless of size. LinkedList’s get(index) traverses from head or tail (whichever is closer), making it O(n/2). At 1M elements, this is 500,000 pointer dereferences.

Middle insertion: Even though LinkedList’s node insertion is O(1) pointer manipulation, the O(n/2) traversal to find the insertion point dominates. ArrayList’s System.arraycopy shift is faster because it operates on contiguous memory.

Memory Overhead Per Element

The cost difference extends beyond CPU time. Each collection implementation consumes a different amount of memory per element.

// Memory layout analysis
// Object header: 12 bytes (compressed oops) + padding

// ArrayList: backed by Object[]
// Per element overhead: 4 bytes (one reference in the array, compressed oops)
// Plus: array object header (16 bytes) + length field (4 bytes)
// Wasted: unused capacity slots (arrayList.size() <= capacity)
// Formula: 20 + capacity * 4 bytes for the backing array

// LinkedList: each element wrapped in a Node
// Node layout:
//   12 bytes object header
//    4 bytes padding
//    8 bytes 'item' reference (or 4 compressed)
//    8 bytes 'next' reference (or 4 compressed)
//    8 bytes 'prev' reference (or 4 compressed)
//   = 40 bytes per Node (with compressed oops: 32 bytes)
// Per element overhead: 32 bytes (compressed oops) vs 4 bytes for ArrayList
// That is 8x the memory per element

For the content platform’s article feed with 10,000 articles:

ArrayList:  20 + 10,000 * 4 = 40,020 bytes  (~39 KB)
LinkedList: 10,000 * 32     = 320,000 bytes  (~312 KB)

LinkedList uses 8x more memory for the same data. That memory overhead increases GC pressure (more objects to trace and collect), reduces effective cache capacity (the L3 cache holds fewer useful elements), and increases the working set size that must stay resident to avoid page faults.

The Content Platform Article Feed

The content platform’s homepage displays articles in ranked order. The ranking engine produces a List<RankedArticle> sorted by score. The rendering pipeline iterates this list sequentially. The search handler returns results by relevance. The recommendation engine merges multiple ranked lists.

Every one of these operations is iteration-dominant with occasional random access for pagination:

// SLOW: LinkedList for a read-heavy, iterate-dominant workload
public class ArticleFeedService {

    // "We might need to insert in the middle when merging feeds"
    private final List<RankedArticle> feed = new LinkedList<>(); // SLOW

    public void updateFeed(List<RankedArticle> ranked) {
        feed.clear();
        feed.addAll(ranked);
    }

    public List<RankedArticle> getPage(int page, int pageSize) {
        int start = page * pageSize;
        int end = Math.min(start + pageSize, feed.size());
        return feed.subList(start, end); // O(n) traversal to reach 'start'
    }

    public RankedArticle getArticle(int rank) {
        return feed.get(rank); // O(n) traversal
    }

    public int totalArticles() {
        return feed.size();
    }
}
// FAST: ArrayList for the same workload
public class ArticleFeedService {

    private final List<RankedArticle> feed = new ArrayList<>(); // FAST

    public void updateFeed(List<RankedArticle> ranked) {
        feed.clear();
        if (feed instanceof ArrayList<RankedArticle> al) {
            al.ensureCapacity(ranked.size());
        }
        feed.addAll(ranked);
    }

    public List<RankedArticle> getPage(int page, int pageSize) {
        int start = page * pageSize;
        int end = Math.min(start + pageSize, feed.size());
        return feed.subList(start, end); // O(1) index math
    }

    public RankedArticle getArticle(int rank) {
        return feed.get(rank); // O(1) array index
    }

    public int totalArticles() {
        return feed.size();
    }
}

The only scenario where LinkedList could theoretically help is frequent insertion and removal at both ends without ever indexing into the middle. For that pattern, ArrayDeque is still better, as Section 2 will demonstrate.

When ArrayList Costs You: The Grow-and-Copy Trap

ArrayList is not free of performance traps. The backing array starts at capacity 10 (or the value you specify in the constructor) and grows by 50% when full. Each growth allocates a new, larger array and copies all elements with System.arraycopy. For a list that grows from 0 to 1,000,000 elements via repeated add() calls:

// SLOW: default capacity, 30+ resize-and-copy operations
List<Article> articles = new ArrayList<>(); // initial capacity 10
for (int i = 0; i < 1_000_000; i++) {
    articles.add(article); // triggers arraycopy at 10, 15, 22, 33, ...
}

The resize sequence: 10, 15, 22, 33, 49, 73, 109, … until it exceeds 1,000,000. That is approximately 30 resize operations, each copying an increasingly large array. The total bytes copied across all resizes is approximately 2.5x the final array size.

// FAST: pre-sized capacity, zero resize operations
List<Article> articles = new ArrayList<>(1_000_000); // exact capacity
for (int i = 0; i < 1_000_000; i++) {
    articles.add(article); // no arraycopy, just index increment
}

When you know the final size (or a reasonable upper bound), use the capacity constructor.

@BenchmarkMode(Mode.AverageTime)
@Warmup(iterations = 3, time = 2)
@Measurement(iterations = 5, time = 5)
@Fork(2)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
@State(Scope.Benchmark)
public class ArrayListSizingBenchmark {

    @Param({"1000", "100000", "1000000"})
    private int size;

    @Benchmark
    public List<String> defaultCapacity() {
        List<String> list = new ArrayList<>();
        for (int i = 0; i < size; i++) {
            list.add("item-" + i);
        }
        return list;
    }

    @Benchmark
    public List<String> preSized() {
        List<String> list = new ArrayList<>(size);
        for (int i = 0; i < size; i++) {
            list.add("item-" + i);
        }
        return list;
    }
}
SizeDefault (us)Pre-sized (us)Savings
1,000423517%
100,0005,8004,20028%
1,000,00072,00048,00033%

The savings come from eliminating array allocation and copy operations during growth. At 1M elements, the default-capacity path copies approximately 2.5 million references across ~30 resize events. The pre-sized path copies zero.

Trimming Wasted Capacity

After the list is fully populated, ArrayList may have significant unused capacity. If the list persists in memory (like a cached article feed), that wasted capacity is dead memory:

// After building a list of exactly 10,000 articles,
// the backing array may have capacity 14,000 (after the last growth)
List<Article> feed = buildArticleFeed(); // 10,000 elements, capacity ~14,000

// FAST: release the unused 4,000 slots
((ArrayList<Article>) feed).trimToSize(); // backs array down to exactly 10,000

trimToSize() allocates a new array of exact size and copies all elements. This is a one-time O(n) cost that saves persistent memory. Use it for long-lived, read-heavy collections. Do not use it for collections that continue to grow; you will trigger an immediate resize on the next add().

The Real Decision Framework

Stop choosing collections based on algorithmic complexity classes. Choose based on access pattern:

Use ArrayList when:

  • You iterate more than you insert/remove (almost always)
  • You index by position (get(i), subList)
  • The list is read-heavy after initial population
  • You want the smallest memory footprint

Use LinkedList when:

  • Never. Use ArrayDeque for queue/stack patterns. Use ArrayList for everything else.

Use ArrayDeque when:

  • FIFO queue: offer() / poll()
  • LIFO stack: push() / pop()
  • Producer-consumer patterns (single-threaded)
  • Any double-ended queue operation

Use CopyOnWriteArrayList when:

  • Reads vastly outnumber writes
  • Iteration must not throw ConcurrentModificationException
  • The list is small (under 1,000 elements)
  • Example: the content platform’s list of active content sources (changes once per hour, read every second)

The next two sections benchmark ArrayDeque vs LinkedList for queue patterns and dive deeper into cache miss analysis with hardware performance counters.