Projects That Build Understanding

Reading about systems is not the same as building them. You can read every page of CSAPP and still not understand cache behavior until you’ve watched your own code run three times slower because you iterated over a matrix column-by-column instead of row-by-row. These five projects are designed to produce that kind of understanding — the kind that lives in your fingers, not just your head.

Each project is a specification, not a tutorial. You get learning objectives, functional requirements, and a suggested approach. You don’t get step-by-step instructions, because step-by-step instructions teach you to follow instructions.

Project 1: Build a Memory Allocator

Learning Objectives: Understand how dynamic memory allocation works beneath malloc() and free(). Experience fragmentation, coalescing, and the trade-off between allocation speed and memory utilization.

Specification:

Implement a memory allocator in C that provides two functions:

void* my_malloc(size_t size) — allocate at least size bytes and return a pointer to the usable region
void my_free(void* ptr) — mark the region as available for future allocations

Requirements:

Use sbrk() or mmap() to request memory from the OS
Maintain a free list of available blocks
On free(), coalesce adjacent free blocks into a single larger block
Split blocks that are significantly larger than the requested size
Handle alignment (return 8-byte aligned addresses)

Suggested Approach: Start with a linked list of free blocks. Each block has a header containing its size and whether it’s free. malloc walks the free list looking for a block large enough (first-fit is fine). free marks a block as available and checks its neighbors. Get first-fit working, then implement coalescing, then splitting.

What You’ll Understand: Why malloc is not trivial. Why fragmentation is an emergent property of any allocation strategy. Why languages with garbage collectors pay a real cost for compaction. Why pool allocators exist. When someone says “memory leak,” you’ll think in terms of blocks and free lists instead of shrugging and restarting the process.

Project 2: Build a Key-Value Store

Learning Objectives: Understand how databases persist data to disk, why write-ahead logs exist, and why the performance characteristics of writes and reads are fundamentally different in storage engines.

Specification:

Implement a persistent key-value store in Python with this interface:

put(key: str, value: str) — store a key-value pair, overwriting if the key exists
get(key: str) -> str | None — retrieve the value for a key, or None if absent
delete(key: str) — remove a key

Requirements:

Data must survive process restarts (persist to disk)
Implement a simplified LSM (Log-Structured Merge) tree:
- Writes go to an in-memory sorted dict (the “memtable”)
- When the memtable exceeds 1000 entries, flush it to a sorted file on disk (an “SSTable”)
- Reads check the memtable first, then scan SSTables from newest to oldest
Implement compaction: merge multiple SSTables into a single sorted file
The store must handle at least 10,000 puts and gets without corruption

Suggested Approach: Start with a simple append-only file where every operation writes a line. Then add the memtable. Then add SSTable flushing. Then add compaction. Each step is a meaningful improvement that teaches you something about why real databases are structured the way they are.

What You’ll Understand: Why LSM trees favor writes over reads. Why compaction is necessary and expensive. Why write-ahead logs prevent data loss. Why your database vendor talks about “write amplification” and “space amplification.” Why choosing between an LSM and a B-tree is an actual engineering decision with measurable consequences.

Project 3: Build an HTTP Client

Learning Objectives: Understand the HTTP protocol at the byte level. See how request formatting, header parsing, chunked encoding, and redirects actually work beneath requests.get().

Specification:

Implement an HTTP client in Python using raw sockets:

get(url: str) -> Response — fetch a URL via GET
post(url: str, body: str) -> Response — send a POST with a body

Requirements:

Parse the URL to extract host, port, and path
Resolve DNS and connect via TCP
Send properly formatted HTTP/1.1 requests with Host, Connection, and Content-Length headers
Parse the response: status line, headers, body
Handle Content-Length-delimited bodies
Handle chunked transfer encoding
Follow 301/302 redirects (up to 5 hops)
Handle HTTPS via ssl.wrap_socket() (this one you’re allowed to delegate to the standard library)

Here’s your starting point:

import socket
import ssl
from urllib.parse import urlparse

class Response:
    def __init__(self, status_code, headers, body):
        self.status_code = status_code
        self.headers = headers
        self.body = body

def http_get(url, max_redirects=5):
    for _ in range(max_redirects):
        parsed = urlparse(url)
        host = parsed.hostname
        port = parsed.port or (443 if parsed.scheme == 'https' else 80)
        path = parsed.path or '/'
        if parsed.query:
            path += '?' + parsed.query

        sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        sock.settimeout(10)

        if parsed.scheme == 'https':
            context = ssl.create_default_context()
            sock = context.wrap_socket(sock, server_hostname=host)

        sock.connect((host, port))
        request = (
            f'GET {path} HTTP/1.1\r\n'
            f'Host: {host}\r\n'
            f'Connection: close\r\n'
            f'User-Agent: learning-http-client/1.0\r\n'
            f'\r\n'
        )
        sock.sendall(request.encode())

        # Receive response — this is where you start building
        raw = b''
        while True:
            chunk = sock.recv(4096)
            if not chunk:
                break
            raw += chunk
        sock.close()

        header_end = raw.find(b'\r\n\r\n')
        header_block = raw[:header_end].decode()
        body = raw[header_end + 4:]

        lines = header_block.split('\r\n')
        status_code = int(lines[0].split(' ', 2)[1])
        headers = {}
        for line in lines[1:]:
            k, v = line.split(': ', 1)
            headers[k.lower()] = v

        if status_code in (301, 302) and 'location' in headers:
            url = headers['location']
            continue

        return Response(status_code, headers, body)

    raise Exception('Too many redirects')

This handles GET with redirects and TLS, but it reads the entire response into memory at once (breaking on large responses), doesn’t handle chunked encoding, and doesn’t support POST. Those are your problems to solve.

What You’ll Understand: Why HTTP headers are text but bodies are bytes. Why Content-Length and chunked encoding are two solutions to the same problem: “how does the client know when the body ends?” Why persistent connections need framing. Why your requests.post() call sets headers you never asked for. Why HTTP/2 exists.

Project 4: Build a Container

Learning Objectives: Understand that containers are not virtual machines. They are processes with restricted views of the system, implemented through Linux kernel features that are individually simple.

Specification:

Build a minimal container runtime using Linux namespaces and cgroups:

Isolate the process’s view of the filesystem (mount namespace)
Isolate the process’s view of other processes (PID namespace)
Isolate the hostname (UTS namespace)
Limit memory usage to a configurable cap (cgroups v2)
Run a specified command inside the isolated environment

Requirements:

Use unshare (shell) or Python’s ctypes to create namespaces
Create a minimal root filesystem using debootstrap or extract an Alpine Linux rootfs tar
Use pivot_root or chroot to change the root directory
Write to /sys/fs/cgroup/ to set memory limits
The contained process should see itself as PID 1
The contained process should not see host processes

Suggested Approach: Start with a shell script that uses unshare to create a PID namespace and runs /bin/sh inside it. Then add a mount namespace with a separate rootfs. Then add cgroups for memory limits. Then port it to Python using os.unshare() (Python 3.12+) or ctypes for older versions. Each step adds one isolation mechanism.

What You’ll Understand: That Docker is not magic. That containers are composed of roughly five kernel features, each doing one thing. That namespace isolation is imperfect and breakable. Why container escapes are possible and what they exploit. Why you see “PID 1 behavior” bugs in containerized applications.

Project 5: Build a Load Balancer

Learning Objectives: Understand how traffic distribution works at the TCP level, why connection management matters, and what health checks actually require.

Specification:

Implement a TCP round-robin load balancer in Python:

Accept incoming TCP connections on a configured port
Forward each connection to a pool of backend servers in round-robin order
Proxy data bidirectionally between client and backend
Remove backends that fail health checks; re-add them when they recover
Log connections: timestamp, client address, selected backend, bytes transferred

Requirements:

Use raw sockets and select/poll for multiplexing (no external libraries)
Health check: attempt a TCP connection to each backend every 10 seconds
Handle backends going down mid-connection gracefully (close the client socket, log the error)
Support at least 50 concurrent proxied connections

Suggested Approach: Start with a single-backend proxy that connects a client socket to a backend socket and shuttles bytes between them using select. Then add multiple backends with round-robin selection. Then add health checks in a background thread. Then add graceful failure handling.

What You’ll Understand: Why load balancers are not just “routing.” Why connection draining is hard. Why health checks can lie (the backend accepts TCP connections but the application is deadlocked). Why sticky sessions exist. Why L4 and L7 load balancing are fundamentally different problems. Why NGINX’s connection handling is impressive.

The Common Thread

Every one of these projects gives you the same gift: the ability to look at a production system and see the mechanism instead of the label. You stop saying “the database is slow” and start saying “the LSM compaction is contending with read operations on the same disk.” You stop saying “the container crashed” and start saying “PID 1 received SIGTERM and the application didn’t handle it.” You stop saying “the network is flaky” and start saying “the TCP retransmission timeout is too aggressive for this link latency.”

That specificity is the difference between an engineer who waits for the abstraction to fix itself and one who fixes the problem in the layer where it actually lives. Build these projects. Break them. Fix them. That’s the curriculum.