The Splitting Algorithm
The Splitting Algorithm
When leaf_node_insert returns NODE_FULL, we face a choice: reject the insertion, or make room. Rejection is not acceptable for a general-purpose storage engine. Making room means splitting the full node into two nodes, each roughly half full, and promoting a key into the parent to maintain the search invariant.
Splitting is the most complex operation in a B-Tree. It touches three pages simultaneously (the full node, the new sibling, and the parent), and it has a special case when the root splits. We will implement it methodically, starting with the leaf split and then handling root promotion.
The Leaf Split Procedure
When a leaf node with 102 cells receives a 103rd insertion:
- Allocate a new page for the right sibling.
- Determine the split point: cell index 51 (the median).
- Copy cells [51..101] from the old leaf into the new sibling.
- Truncate the old leaf to 51 cells.
- Insert the new key-value pair into whichever half it belongs (old left or new right).
- The key at the split point (the promoted key) goes up to the parent internal node.
Before split (102 cells, inserting key K):
┌──────────────────────────────────────────────┐
│ [k0] [k1] ... [k50] [k51] ... [k101] │ ← full leaf
└──────────────────────────────────────────────┘
After split:
┌──────────────────────┐ ┌──────────────────────┐
│ [k0] ... [k50] │ │ [k51] ... [k101] │
│ left leaf (51 cells) │ │ right leaf (51 cells) │
└──────────────────────┘ └──────────────────────┘
↑
promoted key = k51
goes to parent internal node
Implementation
LEAF_NODE_LEFT_SPLIT_COUNT: Final[int] = (LEAF_NODE_MAX_CELLS + 1) // 2 # 51
LEAF_NODE_RIGHT_SPLIT_COUNT: Final[int] = (
LEAF_NODE_MAX_CELLS + 1 - LEAF_NODE_LEFT_SPLIT_COUNT
) # 52 (the extra cell is the one being inserted)
def leaf_node_split_and_insert(
pager: Pager,
old_page_num: int,
key: int,
value: bytes,
) -> tuple[int, int]:
"""Split a full leaf node and insert the new key-value pair.
Returns (promoted_key, new_page_num) for the caller to insert
into the parent internal node.
"""
old_page = pager.get_page(old_page_num)
old_num_cells = leaf_node_num_cells(old_page)
assert old_num_cells == LEAF_NODE_MAX_CELLS, "Only split when full"
# --- Step 1: Allocate the new right sibling ---
new_page_num = pager.get_unused_page_num()
new_page = pager.get_page(new_page_num)
initialize_leaf_node(new_page, is_root=False)
# --- Step 2: Build a virtual list of 103 cells (102 existing + 1 new) ---
# We'll distribute them across old (left) and new (right).
# The new key is inserted at its sorted position.
insert_index = leaf_node_find(old_page, key)
# We iterate from cell 102 down to 0 (103 total).
# Cells at indices >= LEAF_NODE_LEFT_SPLIT_COUNT go to the right sibling.
for i in range(LEAF_NODE_MAX_CELLS, -1, -1):
if i >= LEAF_NODE_LEFT_SPLIT_COUNT:
dest_page = new_page
dest_cell = i - LEAF_NODE_LEFT_SPLIT_COUNT
else:
dest_page = old_page
dest_cell = i
dest_offset = leaf_node_cell_offset(dest_cell)
if i == insert_index:
# This is where the new cell goes
struct.pack_into("<I", dest_page, dest_offset, key)
dest_page[
dest_offset + LEAF_NODE_KEY_SIZE
: dest_offset + LEAF_NODE_CELL_SIZE
] = value
elif i > insert_index:
# Shift: this cell was previously at index (i - 1)
src_offset = leaf_node_cell_offset(i - 1)
dest_page[dest_offset : dest_offset + LEAF_NODE_CELL_SIZE] = old_page[
src_offset : src_offset + LEAF_NODE_CELL_SIZE
]
else:
# i < insert_index: cell stays at its original index
if dest_page is not old_page or dest_cell != i:
src_offset = leaf_node_cell_offset(i)
dest_page[
dest_offset : dest_offset + LEAF_NODE_CELL_SIZE
] = old_page[src_offset : src_offset + LEAF_NODE_CELL_SIZE]
# --- Step 3: Update cell counts ---
struct.pack_into("<I", old_page, 2, LEAF_NODE_LEFT_SPLIT_COUNT)
struct.pack_into("<I", new_page, 2, LEAF_NODE_RIGHT_SPLIT_COUNT)
# --- Step 4: The promoted key is the first key in the right sibling ---
promoted_key = leaf_node_key(new_page, 0)
return promoted_key, new_page_num
Key design decisions:
- We iterate in reverse to avoid overwriting cells we have not yet copied. This is the same principle as
memmovefor overlapping regions. - The promoted key is the smallest key in the right sibling. This maintains the B-Tree invariant: “all keys in the left child < promoted key ≤ all keys in the right child.”
- Cell counts are written atomically — 51 in the left, 52 in the right (the extra cell is the newly inserted one).
Splitting the Root
The root split is a special case. Normally, the promoted key is inserted into the parent. But if the root is a leaf and it splits, there is no parent yet. We must create a new root internal node:
def create_new_root(
pager: Pager,
old_root_page_num: int,
promoted_key: int,
right_child_page_num: int,
) -> int:
"""Create a new root internal node after the old root splits.
The old root becomes the left child; a new page becomes the root.
Returns the page number of the new root.
"""
# Allocate a new page for the old root's data
left_child_page_num = pager.get_unused_page_num()
left_child_page = pager.get_page(left_child_page_num)
# Copy old root content to the left child page
old_root_page = pager.get_page(old_root_page_num)
left_child_page[:PAGE_SIZE] = old_root_page[:PAGE_SIZE]
# Mark left child as non-root
struct.pack_into("B", left_child_page, 1, 0) # is_root = 0
# Reinitialize the old root page as an internal node
initialize_internal_node(old_root_page, is_root=True)
# Set up the new root: one key, left child in cell 0, right child
internal_node_set_cell(old_root_page, 0, left_child_page_num, promoted_key)
internal_node_set_right_child(old_root_page, right_child_page_num)
struct.pack_into("<I", old_root_page, 2, 1) # num_keys = 1
return old_root_page_num # root stays at the same page number
Why keep the root at the same page number? Every reference to the tree points to the root page. If we moved the root to a different page, we would need to update every caller. Instead, we copy the old root’s data to a new page (the left child) and reuse the root page for the new internal node.
Visualizing a Root Split
Before (root is leaf, 102 cells, inserting key K):
Page 0 (root, leaf):
┌──────────────────────────────────────┐
│ [k0] [k1] ... [k50] [k51] ... [k101]│
└──────────────────────────────────────┘
After:
Page 0 (root, internal):
┌────────────────────────────────┐
│ num_keys=1 │
│ right_child → Page 2 │
│ cell 0: [child=1, key=k51] │
└────────────────────────────────┘
↙ ↘
Page 1 (leaf): Page 2 (leaf):
┌────────────┐ ┌────────────┐
│ [k0..k50] │ │ [k51..k101]│
│ 51 cells │ │ + new key │
└────────────┘ └────────────┘
The tree grew from depth 1 (single leaf) to depth 2 (one internal node + two leaves). Total capacity jumped from 102 to $2 \times 102 = 204$ rows. The next split will add a third leaf, increasing capacity to 306 rows, without adding another level.
The Unified Insert Function
We can now write the top-level insertion that handles both the normal case and the split case:
def tree_insert(pager: Pager, root_page_num: int, key: int, value: bytes) -> None:
"""Insert a key-value pair into the B-Tree rooted at `root_page_num`."""
page = pager.get_page(root_page_num)
node_type = struct.unpack_from("B", page, 0)[0]
if node_type == LEAF_NODE_TYPE:
result = leaf_node_insert(page, key, value)
if result == InsertResult.SUCCESS:
return
elif result == InsertResult.DUPLICATE_KEY:
raise ValueError(f"Duplicate key: {key}")
elif result == InsertResult.NODE_FULL:
# Split the leaf
promoted_key, new_page_num = leaf_node_split_and_insert(
pager, root_page_num, key, value
)
# Check if this leaf was the root
is_root = struct.unpack_from("B", page, 1)[0]
if is_root:
create_new_root(pager, root_page_num, promoted_key, new_page_num)
else:
# Insert promoted key into parent (requires parent tracking)
raise NotImplementedError(
"Non-root leaf split requires parent tracking"
)
else:
# Internal node — descend to the correct child
child_page_num = internal_node_find_child(page, key)
tree_insert(pager, child_page_num, key, value)
The NotImplementedError is intentional. To insert the promoted key into a non-root parent, we need to track the path from root to leaf during descent. Several strategies exist: passing a parent stack, storing parent pointers in nodes, or using iterative descent with an explicit stack. We defer this to CH5-S2 where we also handle internal node splits.
Testing the Split
def test_leaf_split():
"""Verify that inserting beyond capacity triggers a split."""
import os
db_path = "test_split.db"
if os.path.exists(db_path):
os.remove(db_path)
pager = Pager(db_path)
# Initialize root as a leaf
root_page = pager.get_page(0)
initialize_leaf_node(root_page, is_root=True)
pager.num_pages = 1
# Insert 103 rows (will trigger one split)
for k in range(103):
row = Row(id=k, username=f"user_{k:03d}")
tree_insert(pager, 0, k, row.serialize())
# After split: page 0 is internal, pages 1 and 2 are leaves
root = pager.get_page(0)
assert struct.unpack_from("B", root, 0)[0] == INTERNAL_NODE_TYPE
assert internal_node_num_keys(root) == 1
left_page_num = internal_node_child(root, 0)
right_page_num = internal_node_right_child(root)
left = pager.get_page(left_page_num)
right = pager.get_page(right_page_num)
left_count = leaf_node_num_cells(left)
right_count = leaf_node_num_cells(right)
assert left_count + right_count == 103
# Verify all 103 keys are findable
for k in range(103):
cursor = tree_find(pager, 0, k)
row = Row.deserialize(bytes(cursor.value()))
assert row.id == k
pager.close()
os.remove(db_path)
print("Leaf split test passed.")
test_leaf_split()
This test verifies three properties:
- The root became an internal node after the split.
- All 103 rows are distributed across two leaves.
- Every key is still findable via
tree_find.
The splitting algorithm is correct but incomplete — it only handles the case where the root leaf splits. The next section extends the cursor to traverse multi-level trees and implements the full insertion path with parent tracking.