B-Tree Implementation: Leaf Nodes
SummaryThis section introduces the structure of leaf nodes...
This section introduces the structure of leaf nodes...
This section introduces the structure of leaf nodes in a B-Tree, the foundational component for database indexing. Leaf nodes store key-value pairs without child pointers, using a binary layout for efficient access. The leaf node header contains metadata: Node Type (0 for leaf, 1 for internal), Is Root flag (boolean), and Num Cells (unsigned integer), formatted in little-endian byte order. Each cell consists of a key (4-byte integer) for binary search and a value (36-byte serialized row data) for actual storage, enabling O(log N) access times. The implementation is in Python 3.12+ with type hints, using the struct module and memoryview for low-level binary manipulation. Key functions include initialize_leaf_node to create nodes with default headers, create_cell to pack key-value pairs, and extract_cell to retrieve data without copying. The design aligns with page-based storage (4096-byte pages) and leverages fixed offsets for direct access, setting the stage for B-Tree operations.
B-Tree Implementation: Leaf Nodes
The benchmarks in the previous chapter proved that linear scan is untenable for large datasets. The solution is a B-Tree — a self-balancing tree optimized for systems that read and write fixed-size blocks. B-Trees were invented in 1970 by Rudolf Bayer and Edward McCreight specifically for disk-based storage, and nearly every production database engine uses some variant of them today.
A B-Tree is built from two kinds of nodes: leaf nodes that store actual key-value data, and internal nodes that store keys and child pointers for routing searches. We start with leaf nodes because they are simpler and immediately useful — a single leaf node is already a sorted, searchable container.
The Paradigm Shift: Pages Become Nodes
Until now, pages held rows in insertion order. Starting here, every page is a B-Tree node. The Pager does not change — it still reads and writes 4096-byte blocks — but the interpretation of those bytes changes completely.
Each page now begins with a node header instead of raw row data:
┌──────────────────────────────────────────────┐
│ Node Type (1B) │ Is Root (1B) │ Num Cells (4B) │
├──────────────────────────────────────────────┤
│ Cell 0: [Key (4B)] [Value (36B)] │
│ Cell 1: [Key (4B)] [Value (36B)] │
│ ... │
│ Cell N-1 │
│ [unused space / zero padding] │
└──────────────────────────────────────────────┘
Leaf Node Header
The header occupies 6 bytes at the start of every leaf node:
| Field | Format | Size | Offset | Description |
|---|---|---|---|---|
| Node Type | B (unsigned byte) | 1 | 0 | 0 = leaf, 1 = internal |
| Is Root | B (unsigned byte) | 1 | 1 | 1 if this is the tree’s root |
| Num Cells | <I (unsigned 32-bit int) | 4 | 2 | Count of key-value cells stored |
The format string is <B B I — little-endian, 6 bytes total.
Leaf Node Cells
After the 6-byte header, cells are packed contiguously. Each cell holds a key and a value:
| Field | Size | Description |
|---|---|---|
| Key | 4 bytes | Unsigned 32-bit integer, the row’s id |
| Value | 36 bytes | Serialized Row (the full ROW_SIZE payload) |
| Cell total | 40 bytes | LEAF_NODE_CELL_SIZE |
Cells are stored in ascending key order. This invariant is what makes binary search possible. The maximum number of cells per leaf node:
$$\text{max_cells} = \left\lfloor \frac{\text{PAGE_SIZE} - \text{HEADER_SIZE}}{\text{CELL_SIZE}} \right\rfloor = \left\lfloor \frac{4096 - 6}{40} \right\rfloor = 102$$
So each leaf node holds up to 102 key-value pairs, with 10 bytes of unused space at the end of the page.
Constants and Node Initialization
# btree.py
import struct
from typing import Final
from constants import PAGE_SIZE
from row import Row
# --- Leaf Node Layout ---
LEAF_NODE_TYPE: Final[int] = 0
INTERNAL_NODE_TYPE: Final[int] = 1
LEAF_NODE_HEADER_FORMAT: Final[str] = "<B B I"
LEAF_NODE_HEADER_SIZE: Final[int] = struct.calcsize(LEAF_NODE_HEADER_FORMAT) # 6
LEAF_NODE_KEY_SIZE: Final[int] = 4
LEAF_NODE_VALUE_SIZE: Final[int] = Row.ROW_SIZE # 36
LEAF_NODE_CELL_SIZE: Final[int] = LEAF_NODE_KEY_SIZE + LEAF_NODE_VALUE_SIZE # 40
LEAF_NODE_MAX_CELLS: Final[int] = (
(PAGE_SIZE - LEAF_NODE_HEADER_SIZE) // LEAF_NODE_CELL_SIZE
) # 102
def initialize_leaf_node(page: memoryview, is_root: bool = False) -> None:
"""Write a fresh leaf node header into the page buffer."""
struct.pack_into(
LEAF_NODE_HEADER_FORMAT,
page,
0,
LEAF_NODE_TYPE, # node_type = 0 (leaf)
1 if is_root else 0, # is_root flag
0, # num_cells = 0
)
# Zero out the data section
page[LEAF_NODE_HEADER_SIZE:] = b"\x00" * (PAGE_SIZE - LEAF_NODE_HEADER_SIZE)
We use struct.pack_into instead of struct.pack because it writes directly into the memoryview at a given offset — no intermediate bytes object, no copy.
Reading and Writing Cells
Two helper functions to access cells by index within a leaf node:
def leaf_node_cell_offset(cell_num: int) -> int:
"""Return the byte offset of cell `cell_num` within the page."""
return LEAF_NODE_HEADER_SIZE + cell_num * LEAF_NODE_CELL_SIZE
def leaf_node_key(page: memoryview, cell_num: int) -> int:
"""Read the key from cell `cell_num`."""
offset = leaf_node_cell_offset(cell_num)
return struct.unpack_from("<I", page, offset)[0]
def leaf_node_value(page: memoryview, cell_num: int) -> memoryview:
"""Return a memoryview of the value (serialized Row) in cell `cell_num`."""
offset = leaf_node_cell_offset(cell_num) + LEAF_NODE_KEY_SIZE
return page[offset : offset + LEAF_NODE_VALUE_SIZE]
def leaf_node_num_cells(page: memoryview) -> int:
"""Read the num_cells field from the header."""
return struct.unpack_from("<I", page, 2)[0]
These functions form the lowest-level access API for leaf nodes. Everything above — insertion, search, splitting — is built on top of them.
Summary of Leaf Node Geometry
| Property | Value | Derivation |
|---|---|---|
| Page size | 4096 bytes | PAGE_SIZE constant |
| Header size | 6 bytes | <B B I → 1 + 1 + 4 |
| Cell size | 40 bytes | 4 (key) + 36 (value) |
| Max cells per leaf | 102 | (4096 − 6) ÷ 40 |
| Wasted bytes per page | 10 | 4096 − 6 − (102 × 40) |
| Data capacity per leaf | 4080 bytes | 102 × 40 |
With the leaf node structure defined, we can proceed to the two critical operations: inserting a key-value pair in sorted order, and searching for a key using binary search.