Optimizing I/O Performance: Building a Faster Alternative to cp and rsync
These articles are AI-generated summaries. Please check the original sources for full details.
I built a faster alternative to cp and rsync — here’s how it works
Systems engineer Krit K developed fast-copy to overcome the performance bottlenecks inherent in traditional Unix file utilities. The tool achieves significant speed gains by resolving physical disk offsets to transform random I/O into sequential reads.
Why This Matters
Traditional file utilities like cp -r read files in directory order, which translates to random disk access on mechanical drives. In high-density file environments, every seek operation costs 5-10ms, causing linear performance degradation that standard tools fail to address. By utilizing low-level system calls like FIEMAP and fcntl, engineers can maximize sequential disk speed and bypass the protocol overhead typically found in SFTP and SCP transfers.
Key Insights
- Hard drive seek latency of 5-10ms per file creates massive overhead when copying tens of thousands of small files via directory order.
- Fast-copy utilizes Linux FIEMAP, macOS fcntl, and Windows FSCTL to resolve the physical block positions of files before execution.
- Deduplication using xxHash-128 saved 378.9 MB of I/O and reduced transfer volume by nearly 50% in a 92K file test case.
- SSH tar streaming eliminates SFTP protocol overhead by piping chunked ~100 MB batches directly into a remote tar process.
- A persistent SQLite database of file hashes enables efficient incremental copies by skipping previously verified data.
Working Examples
Basic command to execute a local-to-local file copy using fast-copy.
python fast_copy.py /source /destination
Installation of optional dependencies for SSH transfers and high-performance hashing.
pip install paramiko
pip install xxhash
Practical Applications
- Use Case: Moving 92K files to a USB drive at 28.5 MB/s using physical disk offset sorting. Pitfall: Using standard cp -r results in excessive head movement and significantly slower completion times.
- Use Case: Transferring bulk data to a Synology NAS with SFTP disabled by leveraging raw SSH tar streaming. Pitfall: Relying on SCP which may top out at 1-2 MB/s due to protocol overhead.
- Use Case: Incremental backups of node_modules and developer environments using hard links for deduplication. Pitfall: Redundantly copying identical files across multiple project directories, wasting storage and bandwidth.
References:
Continue reading
Next article
Maximizing AWS Certification ROI: A Solutions Architect's Guide to High-Value Credentials
Related Content
Building a Multi-Target Compiler Backend Without LLVM
Gideon Towolawi is engineering a custom multi-target compiler backend from scratch to achieve granular SIMD control and security-hardened codegen across five architectures.
Building a Fast Offline Calculator Hub with Next.js and Cloudflare
A full-stack engineer builds qalc.ai, a fast offline calculator hub with Next.js and Cloudflare, achieving instant performance and a clean user experience.
Automating Policy-Gated Releases: Building SwiftDeploy for Observable DevOps
SwiftDeploy evolves into a policy-gated system using OPA to block releases if disk space is under 10GB or error rates exceed 1%.