Projects That Build Understanding
SummaryFive detailed project specifications — a memory allocator,...
Five detailed project specifications — a memory allocator,...
Five detailed project specifications — a memory allocator, a key-value store, an HTTP client, a container, and a load balancer — each with clear learning objectives, a functional specification, a suggested implementation approach, and an explanation of what you'll actually understand when you're done. Includes Python starter code for a raw-socket HTTP client.
Projects That Build Understanding
Reading about systems is not the same as building them. You can read every page of CSAPP and still not understand cache behavior until you’ve watched your own code run three times slower because you iterated over a matrix column-by-column instead of row-by-row. These five projects are designed to produce that kind of understanding — the kind that lives in your fingers, not just your head.
Each project is a specification, not a tutorial. You get learning objectives, functional requirements, and a suggested approach. You don’t get step-by-step instructions, because step-by-step instructions teach you to follow instructions.
Project 1: Build a Memory Allocator
Learning Objectives: Understand how dynamic memory allocation works beneath malloc() and free(). Experience fragmentation, coalescing, and the trade-off between allocation speed and memory utilization.
Specification:
Implement a memory allocator in C that provides two functions:
void* my_malloc(size_t size)— allocate at leastsizebytes and return a pointer to the usable regionvoid my_free(void* ptr)— mark the region as available for future allocations
Requirements:
- Use
sbrk()ormmap()to request memory from the OS - Maintain a free list of available blocks
- On
free(), coalesce adjacent free blocks into a single larger block - Split blocks that are significantly larger than the requested size
- Handle alignment (return 8-byte aligned addresses)
Suggested Approach: Start with a linked list of free blocks. Each block has a header containing its size and whether it’s free. malloc walks the free list looking for a block large enough (first-fit is fine). free marks a block as available and checks its neighbors. Get first-fit working, then implement coalescing, then splitting.
What You’ll Understand: Why malloc is not trivial. Why fragmentation is an emergent property of any allocation strategy. Why languages with garbage collectors pay a real cost for compaction. Why pool allocators exist. When someone says “memory leak,” you’ll think in terms of blocks and free lists instead of shrugging and restarting the process.
Project 2: Build a Key-Value Store
Learning Objectives: Understand how databases persist data to disk, why write-ahead logs exist, and why the performance characteristics of writes and reads are fundamentally different in storage engines.
Specification:
Implement a persistent key-value store in Python with this interface:
put(key: str, value: str)— store a key-value pair, overwriting if the key existsget(key: str) -> str | None— retrieve the value for a key, or None if absentdelete(key: str)— remove a key
Requirements:
- Data must survive process restarts (persist to disk)
- Implement a simplified LSM (Log-Structured Merge) tree:
- Writes go to an in-memory sorted dict (the “memtable”)
- When the memtable exceeds 1000 entries, flush it to a sorted file on disk (an “SSTable”)
- Reads check the memtable first, then scan SSTables from newest to oldest
- Implement compaction: merge multiple SSTables into a single sorted file
- The store must handle at least 10,000 puts and gets without corruption
Suggested Approach: Start with a simple append-only file where every operation writes a line. Then add the memtable. Then add SSTable flushing. Then add compaction. Each step is a meaningful improvement that teaches you something about why real databases are structured the way they are.
What You’ll Understand: Why LSM trees favor writes over reads. Why compaction is necessary and expensive. Why write-ahead logs prevent data loss. Why your database vendor talks about “write amplification” and “space amplification.” Why choosing between an LSM and a B-tree is an actual engineering decision with measurable consequences.
Project 3: Build an HTTP Client
Learning Objectives: Understand the HTTP protocol at the byte level. See how request formatting, header parsing, chunked encoding, and redirects actually work beneath requests.get().
Specification:
Implement an HTTP client in Python using raw sockets:
get(url: str) -> Response— fetch a URL via GETpost(url: str, body: str) -> Response— send a POST with a body
Requirements:
- Parse the URL to extract host, port, and path
- Resolve DNS and connect via TCP
- Send properly formatted HTTP/1.1 requests with
Host,Connection, andContent-Lengthheaders - Parse the response: status line, headers, body
- Handle
Content-Length-delimited bodies - Handle chunked transfer encoding
- Follow 301/302 redirects (up to 5 hops)
- Handle HTTPS via
ssl.wrap_socket()(this one you’re allowed to delegate to the standard library)
Here’s your starting point:
import socket
import ssl
from urllib.parse import urlparse
class Response:
def __init__(self, status_code, headers, body):
self.status_code = status_code
self.headers = headers
self.body = body
def http_get(url, max_redirects=5):
for _ in range(max_redirects):
parsed = urlparse(url)
host = parsed.hostname
port = parsed.port or (443 if parsed.scheme == 'https' else 80)
path = parsed.path or '/'
if parsed.query:
path += '?' + parsed.query
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.settimeout(10)
if parsed.scheme == 'https':
context = ssl.create_default_context()
sock = context.wrap_socket(sock, server_hostname=host)
sock.connect((host, port))
request = (
f'GET {path} HTTP/1.1\r\n'
f'Host: {host}\r\n'
f'Connection: close\r\n'
f'User-Agent: learning-http-client/1.0\r\n'
f'\r\n'
)
sock.sendall(request.encode())
# Receive response — this is where you start building
raw = b''
while True:
chunk = sock.recv(4096)
if not chunk:
break
raw += chunk
sock.close()
header_end = raw.find(b'\r\n\r\n')
header_block = raw[:header_end].decode()
body = raw[header_end + 4:]
lines = header_block.split('\r\n')
status_code = int(lines[0].split(' ', 2)[1])
headers = {}
for line in lines[1:]:
k, v = line.split(': ', 1)
headers[k.lower()] = v
if status_code in (301, 302) and 'location' in headers:
url = headers['location']
continue
return Response(status_code, headers, body)
raise Exception('Too many redirects')
This handles GET with redirects and TLS, but it reads the entire response into memory at once (breaking on large responses), doesn’t handle chunked encoding, and doesn’t support POST. Those are your problems to solve.
What You’ll Understand: Why HTTP headers are text but bodies are bytes. Why Content-Length and chunked encoding are two solutions to the same problem: “how does the client know when the body ends?” Why persistent connections need framing. Why your requests.post() call sets headers you never asked for. Why HTTP/2 exists.
Project 4: Build a Container
Learning Objectives: Understand that containers are not virtual machines. They are processes with restricted views of the system, implemented through Linux kernel features that are individually simple.
Specification:
Build a minimal container runtime using Linux namespaces and cgroups:
- Isolate the process’s view of the filesystem (mount namespace)
- Isolate the process’s view of other processes (PID namespace)
- Isolate the hostname (UTS namespace)
- Limit memory usage to a configurable cap (cgroups v2)
- Run a specified command inside the isolated environment
Requirements:
- Use
unshare(shell) or Python’sctypesto create namespaces - Create a minimal root filesystem using
debootstrapor extract an Alpine Linux rootfs tar - Use
pivot_rootorchrootto change the root directory - Write to
/sys/fs/cgroup/to set memory limits - The contained process should see itself as PID 1
- The contained process should not see host processes
Suggested Approach: Start with a shell script that uses unshare to create a PID namespace and runs /bin/sh inside it. Then add a mount namespace with a separate rootfs. Then add cgroups for memory limits. Then port it to Python using os.unshare() (Python 3.12+) or ctypes for older versions. Each step adds one isolation mechanism.
What You’ll Understand: That Docker is not magic. That containers are composed of roughly five kernel features, each doing one thing. That namespace isolation is imperfect and breakable. Why container escapes are possible and what they exploit. Why you see “PID 1 behavior” bugs in containerized applications.
Project 5: Build a Load Balancer
Learning Objectives: Understand how traffic distribution works at the TCP level, why connection management matters, and what health checks actually require.
Specification:
Implement a TCP round-robin load balancer in Python:
- Accept incoming TCP connections on a configured port
- Forward each connection to a pool of backend servers in round-robin order
- Proxy data bidirectionally between client and backend
- Remove backends that fail health checks; re-add them when they recover
- Log connections: timestamp, client address, selected backend, bytes transferred
Requirements:
- Use raw sockets and
select/pollfor multiplexing (no external libraries) - Health check: attempt a TCP connection to each backend every 10 seconds
- Handle backends going down mid-connection gracefully (close the client socket, log the error)
- Support at least 50 concurrent proxied connections
Suggested Approach: Start with a single-backend proxy that connects a client socket to a backend socket and shuttles bytes between them using select. Then add multiple backends with round-robin selection. Then add health checks in a background thread. Then add graceful failure handling.
What You’ll Understand: Why load balancers are not just “routing.” Why connection draining is hard. Why health checks can lie (the backend accepts TCP connections but the application is deadlocked). Why sticky sessions exist. Why L4 and L7 load balancing are fundamentally different problems. Why NGINX’s connection handling is impressive.
The Common Thread
Every one of these projects gives you the same gift: the ability to look at a production system and see the mechanism instead of the label. You stop saying “the database is slow” and start saying “the LSM compaction is contending with read operations on the same disk.” You stop saying “the container crashed” and start saying “PID 1 received SIGTERM and the application didn’t handle it.” You stop saying “the network is flaky” and start saying “the TCP retransmission timeout is too aggressive for this link latency.”
That specificity is the difference between an engineer who waits for the abstraction to fix itself and one who fixes the problem in the layer where it actually lives. Build these projects. Break them. Fix them. That’s the curriculum.