Skip to main content
python database internals building a persistent engine from scratch

Inserting into a Leaf Node

8 min read Chapter 11 of 21
Summary

This section focuses on implementing insertion into a...

This section focuses on implementing insertion into a B-Tree leaf node, a critical operation for maintaining sorted key order to enable efficient searches. The leaf node structure consists of a 6-byte header with fields for node type, root flag, and cell count, followed by a data section containing fixed-size cells of 40 bytes each (4-byte key and 36-byte value). Insertion begins by finding the correct position using binary search, which operates in O(log N) time, where N is the number of cells. Once the position is determined, existing cells are shifted right by one cell size to create space, and the new key-value pair is written at the calculated offset. The header's cell count is incremented after successful insertion. If the node is full—reaching a maximum capacity of 102 cells per 4KB page—the operation fails without splitting, as splitting is deferred to later stages. The implementation is demonstrated in Python 3.12+ with type hints, utilizing the struct module for binary data manipulation and memoryview for in-place buffer operations. A data table summarizes key component sizes and capacities, such as page size (4096 bytes), header size (6 bytes), and cell size (40 bytes), while a layout diagram visually explains the memory shifts. This foundational logic supports single-node operations and sets the groundwork for future B-Tree balancing.

Inserting into a Leaf Node

We have a leaf node with a 6-byte header and room for 102 cells. Now we need to insert data into it — and we must maintain sorted key order, because binary search depends on it.

Insertion into a sorted array requires three steps:

  1. Find where the new key belongs (binary search).
  2. Shift existing cells right to open a gap (memmove).
  3. Write the new cell into the gap and update the header.

Step 2 is the expensive one. Shifting cells is $O(k)$ where $k$ is the number of cells after the insertion point. For a node with at most 102 cells, this is bounded and fast. The real cost savings come in Chapter 5, when we avoid scanning thousands of pages by descending the tree to the right leaf in $O(\log N)$ time.

Binary Search for Insertion Position

Given a leaf node with cells in ascending key order, we use a standard lower-bound binary search to find the index where the new key should go:

def leaf_node_find(page: memoryview, key: int) -> int:
    """Return the cell index where `key` should be inserted.

    If the key already exists, returns its index (caller decides
    whether to overwrite or reject duplicates).
    """
    num_cells = leaf_node_num_cells(page)
    lo, hi = 0, num_cells
    while lo < hi:
        mid = (lo + hi) // 2
        mid_key = leaf_node_key(page, mid)
        if mid_key < key:
            lo = mid + 1
        else:
            hi = mid
    return lo

This is a standard bisect_left algorithm. We search by reading keys directly from the page buffer using leaf_node_key() from the previous section — no intermediate list is needed.

Why lower-bound? A lower-bound search returns the first position where the key could be inserted without violating sort order. If the key already exists at that index, leaf_node_key(page, lo) == key, and we can detect duplicates.

The Insertion Function

from enum import Enum, auto

class InsertResult(Enum):
    SUCCESS = auto()
    DUPLICATE_KEY = auto()
    NODE_FULL = auto()


def leaf_node_insert(page: memoryview, key: int, value: bytes) -> InsertResult:
    """Insert a key-value cell into the leaf node, maintaining sorted order.

    Returns InsertResult indicating success or reason for failure.
    """
    num_cells = leaf_node_num_cells(page)

    # --- Guard: node full ---
    if num_cells >= LEAF_NODE_MAX_CELLS:
        return InsertResult.NODE_FULL

    # --- Find insertion position ---
    index = leaf_node_find(page, key)

    # --- Guard: duplicate key ---
    if index < num_cells and leaf_node_key(page, index) == key:
        return InsertResult.DUPLICATE_KEY

    # --- Shift cells [index..num_cells-1] right by one cell ---
    if index < num_cells:
        src_start = leaf_node_cell_offset(index)
        src_end = leaf_node_cell_offset(num_cells)
        dst_start = leaf_node_cell_offset(index + 1)
        # Copy from right to left to avoid overwriting (reverse direction)
        page[dst_start : dst_start + (src_end - src_start)] = page[
            src_start:src_end
        ]

    # --- Write the new cell ---
    offset = leaf_node_cell_offset(index)
    struct.pack_into("<I", page, offset, key)
    page[offset + LEAF_NODE_KEY_SIZE : offset + LEAF_NODE_CELL_SIZE] = value

    # --- Update header ---
    struct.pack_into("<I", page, 2, num_cells + 1)

    return InsertResult.SUCCESS

Walk through each section:

Guard checks. We reject insertion early if the node has 102 cells (splitting is deferred to Chapter 5) or if the key already exists. Duplicate keys would violate the B-Tree invariant that every key appears exactly once in the leaf level.

Cell shifting. The memoryview slice assignment page[dst:dst+n] = page[src:src+n] copies bytes within the same buffer. We shift everything from index to the end of the data region one cell-width (40 bytes) to the right. Because the destination is to the right of the source and we copy the entire block in one operation, there is no overlap corruption — Python’s memoryview slice assignment handles this correctly (it behaves like memmove, not memcpy).

Writing the cell. We pack the 4-byte key with struct.pack_into, then copy the 36-byte serialized row value directly into the page.

Header update. Only the num_cells field at offset 2 needs updating. We write the incremented count as a 4-byte unsigned integer.

Visualizing the Shift

Inserting key 15 into a node containing keys [10, 20, 30]:

Before insertion (3 cells):
┌────────┬──────────┬──────────┬──────────┬──────────────┐
│ Header │ [10|val] │ [20|val] │ [30|val] │  free space  │
│ (6B)   │  cell 0  │  cell 1  │  cell 2  │              │
└────────┴──────────┴──────────┴──────────┴──────────────┘

Step 1: leaf_node_find returns index=1 (15 > 10, 15 < 20)

Step 2: Shift cells 1..2 right by 40 bytes:
┌────────┬──────────┬──────────┬──────────┬──────────┬────┐
│ Header │ [10|val] │ ##gap### │ [20|val] │ [30|val] │free│
│ (6B)   │  cell 0  │  cell 1  │  cell 2  │  cell 3  │    │
└────────┴──────────┴──────────┴──────────┴──────────┴────┘

Step 3: Write [15|val] into cell 1, update num_cells to 4:
┌────────┬──────────┬──────────┬──────────┬──────────┬────┐
│ Header │ [10|val] │ [15|val] │ [20|val] │ [30|val] │free│
│ (6B)   │  cell 0  │  cell 1  │  cell 2  │  cell 3  │    │
└────────┴──────────┴──────────┴──────────┴──────────┴────┘

Testing Insertion

def test_leaf_node_insertion():
    """Verify insertion maintains sorted order and handles edge cases."""
    page = memoryview(bytearray(PAGE_SIZE))
    initialize_leaf_node(page, is_root=True)

    # Insert keys out of order
    keys = [30, 10, 20, 5, 25]
    for k in keys:
        row = Row(id=k, username=f"user_{k}")
        result = leaf_node_insert(page, k, row.serialize())
        assert result == InsertResult.SUCCESS

    # Verify sorted order
    assert leaf_node_num_cells(page) == 5
    expected = sorted(keys)
    for i, expected_key in enumerate(expected):
        assert leaf_node_key(page, i) == expected_key, (
            f"Cell {i}: expected key {expected_key}, got {leaf_node_key(page, i)}"
        )

    # Duplicate rejection
    result = leaf_node_insert(page, 10, Row(id=10, username="dup").serialize())
    assert result == InsertResult.DUPLICATE_KEY

    # Fill to capacity
    page2 = memoryview(bytearray(PAGE_SIZE))
    initialize_leaf_node(page2)
    for k in range(LEAF_NODE_MAX_CELLS):
        result = leaf_node_insert(page2, k, Row(id=k, username=f"u{k}").serialize())
        assert result == InsertResult.SUCCESS

    # 103rd insertion must fail
    overflow = leaf_node_insert(page2, 999, Row(id=999, username="x").serialize())
    assert overflow == InsertResult.NODE_FULL

    print("All insertion tests passed.")


test_leaf_node_insertion()

Performance Characteristics

OperationCostNotes
Find position$O(\log k)$Binary search over $k \leq 102$ keys
Shift cells$O(k)$At most 102 × 40 = 4,080 bytes moved
Write cell$O(1)$Fixed 40-byte write
Header update$O(1)$4-byte write at offset 2
Total$O(k)$Dominated by the shift

The $O(k)$ cost per insertion seems unimpressive, but remember: $k$ is the number of cells within a single leaf node, not the total number of rows in the database. Once we have a tree of many leaf nodes, the cost of finding the right leaf dominates, and that cost is $O(\log N)$ where $N$ is the total row count.

The leaf node is now a complete, self-contained sorted container. What it cannot do yet is handle overflow. When cell 103 arrives and NODE_FULL fires, we need a strategy. That strategy is splitting, and it requires a new kind of node — the internal node — which we build in Chapter 5.

initialize_leaf_node(node) # Assume a function from shared_materials

success = insert_into_leaf_node(node, 42, row.serialize())

Data Table: Component Sizes and Capacities

The following table summarizes key component sizes and capacities in the leaf node, based on constants from existing knowledge:

ComponentSize (bytes)Description
Page Size4096Fixed size for I/O optimization, from PAGE_SIZE constant.
Header Size6Node Type (1 byte) + Is Root (1 byte) + Num Cells (4 bytes).
Cell Key Size4Assumed 4-byte integer for keys.
Cell Value Size36From Row.ROW_SIZE, serialized row data.
Cell Size40Total: Key + Value = 4 + 36 bytes.
Max Cells per Node102Calculated as (4096 - 6) / 40 = floor(102.25).
Data Section Size4090PAGE_SIZE - HEADER_SIZE, available for cells.
Offset CalculationHEADER_SIZE + index * CELL_SIZEFormula to find byte offset for cell at given index.

Leaf Node Layout Diagram

Before Insertion:

  • Header (6 bytes): [Node Type][Is Root][Num Cells]
  • Data Section: Cells stored sequentially, each CELL_SIZE (40 bytes): [Key (4B)][Value (36B)]
  • Empty space after last cell up to PAGE_SIZE.

After Insertion (assuming position found):

  • Header updated: Num Cells incremented.
  • Cells from insertion index to end shifted right by CELL_SIZE.
  • New cell inserted at the calculated offset, maintaining sorted order of keys.

Visual Flow:

  1. Start with sorted keys: [K1, K2, …, Kn]
  2. Find position for new key K_new using binary search.
  3. Shift cells from position onward to the right.
  4. Insert K_new and its value at the vacated position.
  5. Update header count.