Skip to main content
python database internals building a persistent engine from scratch

B-Tree Implementation: Internal Nodes and Splitting

7 min read Chapter 13 of 21
Summary

This section implements the core components of B-Trees...

This section implements the core components of B-Trees for growth and balance. Internal nodes are defined as structures storing sorted keys and child pointers, with a header format including Node Type, Is Root flag, and Num Keys. The binary layout is detailed with a markdown table showing field sizes and descriptions. The splitting algorithm is crucial for handling full nodes: it finds the median key, creates a new sibling node, redistributes keys and pointers, and updates the parent or creates a new root if necessary, ensuring all leaf nodes remain at the same depth. Parent-child relationship tracking is discussed as essential for maintaining tree integrity during these operations. Python code examples demonstrate initializing an internal node using struct and memoryview with type hints, and a stub for the splitting logic is provided. Key terminology includes Internal Node (stores keys and child pointers), Child Pointer (references to child nodes), Splitting Algorithm (procedure for dividing full nodes), and Median Key (middle key promoted during splitting). This builds on previous leaf node implementations to complete the B-Tree structure for efficient search, insertion, and deletion.

Internal Nodes and the Shape of the Tree

A single leaf node holds 102 rows. Beyond that, insertion fails. To store a million rows, we need roughly 10,000 leaf nodes — and a way to route searches to the correct one without scanning all of them. That routing structure is the internal node.

Internal nodes do not store row data. They store keys and child pointers: the keys define boundaries between child subtrees, and the pointers identify which page to descend into. Together, leaf nodes and internal nodes form a balanced tree where every leaf is at the same depth — the defining property of a B-Tree.

Internal Node Layout

An internal node has the same 6-byte header as a leaf node (we reuse the format), followed by a right child pointer and then a sequence of cells. Each cell is a (child pointer, key) pair:

┌──────────────────────────────────────────────────────────┐
│  Node Type (1B) │ Is Root (1B) │ Num Keys (4B)          │
├──────────────────────────────────────────────────────────┤
│  Right Child (4B)                                        │
├──────────────────────────────────────────────────────────┤
│  Cell 0: [Child Ptr (4B)] [Key (4B)]                    │
│  Cell 1: [Child Ptr (4B)] [Key (4B)]                    │
│  ...                                                      │
│  Cell N-1                                                 │
│  [unused space]                                           │
└──────────────────────────────────────────────────────────┘

Why a separate right child? An internal node with $k$ keys has $k+1$ child pointers. The cells store $k$ (child, key) pairs, but that accounts for only $k$ child pointers. The extra $(k+1)$-th pointer — the rightmost child — is stored immediately after the header, before the cells.

The routing invariant is:

  • All keys in child[i] < key[i] ≤ all keys in child[i+1]
  • All keys in right_child > key[N-1] (the last key)

Constants

# btree.py (continued)

# --- Internal Node Layout ---
INTERNAL_NODE_HEADER_SIZE: Final[int] = LEAF_NODE_HEADER_SIZE  # 6 (same format)
INTERNAL_NODE_RIGHT_CHILD_SIZE: Final[int] = 4
INTERNAL_NODE_RIGHT_CHILD_OFFSET: Final[int] = INTERNAL_NODE_HEADER_SIZE  # 6

INTERNAL_NODE_KEY_SIZE: Final[int] = 4
INTERNAL_NODE_CHILD_SIZE: Final[int] = 4
INTERNAL_NODE_CELL_SIZE: Final[int] = (
    INTERNAL_NODE_CHILD_SIZE + INTERNAL_NODE_KEY_SIZE
)  # 8

INTERNAL_NODE_CELLS_OFFSET: Final[int] = (
    INTERNAL_NODE_HEADER_SIZE + INTERNAL_NODE_RIGHT_CHILD_SIZE
)  # 10

INTERNAL_NODE_MAX_KEYS: Final[int] = (
    (PAGE_SIZE - INTERNAL_NODE_CELLS_OFFSET) // INTERNAL_NODE_CELL_SIZE
)  # (4096 - 10) // 8 = 510

Each internal node holds up to 510 keys (and 511 child pointers). This is the B-Tree’s branching factor — the number of children per internal node. A B-Tree with branching factor 511 and depth 3 can index:

$$511^2 \times 102 = 26{,}632{,}242 \text{ rows}$$

Over 26 million rows reachable in three page reads. That is why B-Trees dominate disk-based indexing.

Initialization

def initialize_internal_node(page: memoryview, is_root: bool = False) -> None:
    """Write a fresh internal node header into the page buffer."""
    struct.pack_into(
        LEAF_NODE_HEADER_FORMAT,  # same format: <B B I
        page,
        0,
        INTERNAL_NODE_TYPE,      # node_type = 1 (internal)
        1 if is_root else 0,
        0,                       # num_keys = 0
    )
    # Zero the right child pointer
    struct.pack_into("<I", page, INTERNAL_NODE_RIGHT_CHILD_OFFSET, 0)
    # Zero the data section
    page[INTERNAL_NODE_CELLS_OFFSET:] = b"\x00" * (
        PAGE_SIZE - INTERNAL_NODE_CELLS_OFFSET
    )

Accessor Functions

def internal_node_num_keys(page: memoryview) -> int:
    """Read num_keys from the header."""
    return struct.unpack_from("<I", page, 2)[0]


def internal_node_right_child(page: memoryview) -> int:
    """Read the right child page number."""
    return struct.unpack_from("<I", page, INTERNAL_NODE_RIGHT_CHILD_OFFSET)[0]


def internal_node_set_right_child(page: memoryview, page_num: int) -> None:
    """Write the right child page number."""
    struct.pack_into("<I", page, INTERNAL_NODE_RIGHT_CHILD_OFFSET, page_num)


def internal_node_cell_offset(cell_num: int) -> int:
    """Return the byte offset of cell `cell_num`."""
    return INTERNAL_NODE_CELLS_OFFSET + cell_num * INTERNAL_NODE_CELL_SIZE


def internal_node_child(page: memoryview, cell_num: int) -> int:
    """Read the child pointer from cell `cell_num`."""
    offset = internal_node_cell_offset(cell_num)
    return struct.unpack_from("<I", page, offset)[0]


def internal_node_key(page: memoryview, cell_num: int) -> int:
    """Read the key from cell `cell_num`."""
    offset = internal_node_cell_offset(cell_num) + INTERNAL_NODE_CHILD_SIZE
    return struct.unpack_from("<I", page, offset)[0]


def internal_node_set_cell(
    page: memoryview, cell_num: int, child: int, key: int
) -> None:
    """Write a (child, key) pair into cell `cell_num`."""
    offset = internal_node_cell_offset(cell_num)
    struct.pack_into("<I", page, offset, child)
    struct.pack_into("<I", page, offset + INTERNAL_NODE_CHILD_SIZE, key)

Each function is small and operates at fixed offsets. This pattern — one function per field, all operating on memoryview — keeps the B-Tree code readable and testable.

Finding the Correct Child

When searching for a key, we descend from the root through internal nodes until we reach a leaf. At each internal node, we must determine which child to follow. This is a binary search on the internal node’s keys:

def internal_node_find_child(page: memoryview, key: int) -> int:
    """Return the page number of the child that may contain `key`.

    Uses binary search on internal node keys to find the correct child pointer.
    """
    num_keys = internal_node_num_keys(page)

    # Binary search for the first key >= target
    lo, hi = 0, num_keys
    while lo < hi:
        mid = (lo + hi) // 2
        if internal_node_key(page, mid) < key:
            lo = mid + 1
        else:
            hi = mid

    # lo is now the index of the first key >= target
    if lo < num_keys:
        # Key belongs in the subtree to the left of key[lo]
        return internal_node_child(page, lo)
    else:
        # Key is greater than all keys — descend into right child
        return internal_node_right_child(page)

Walk through an example. Suppose an internal node has keys [50, 100, 150] and children [C0, C1, C2, C3_right]:

Search keylo after searchResult
250child(0) = C0 (25 < 50)
500child(0) = C0 (50 ≤ 50, exact match goes left)
751child(1) = C1 (50 < 75 < 100)
2003 (= num_keys)right_child = C3 (200 > 150)

The Full Search: Root to Leaf

With internal_node_find_child, we can write a top-level search that descends the tree:

def tree_find(pager: Pager, root_page_num: int, key: int) -> Cursor:
    """Find the leaf node cell for `key`, descending from the root."""
    page = pager.get_page(root_page_num)
    node_type = struct.unpack_from("B", page, 0)[0]  # offset 0 = node type

    if node_type == LEAF_NODE_TYPE:
        # We're at a leaf — use leaf-level binary search
        return Cursor.find(pager, root_page_num, key)

    # Internal node — find the correct child and recurse
    child_page_num = internal_node_find_child(page, key)
    return tree_find(pager, child_page_num, key)

This recursive descent reads one page per tree level. For a tree of depth $d$, the search cost is $O(d \cdot \log b)$ where $b$ is the branching factor (510 for internal nodes, 102 for leaf binary search). Since $d = O(\log_b N)$, the total cost is $O(\log N)$.

Geometry Summary

PropertyLeaf NodeInternal Node
Node type byte01
Header size6 bytes6 bytes
Extra metadataRight child (4 bytes)
Cell contentkey (4B) + value (36B)child ptr (4B) + key (4B)
Cell size40 bytes8 bytes
Max cells/keys102510
Max children511

What the Tree Cannot Do Yet

We have defined the internal node format and the search algorithm that descends through it. But we have no code to create internal nodes. They come into existence through splitting: when a leaf node overflows, it splits in two, and a new key is promoted into the parent. If the parent does not exist (the leaf was the root), a new root internal node is created.

The next section implements the splitting algorithm — the mechanism by which the tree grows.