Skip to main content

On This Page

Optimizing I/O Performance: Building a Faster Alternative to cp and rsync

2 min read
Share

These articles are AI-generated summaries. Please check the original sources for full details.

I built a faster alternative to cp and rsync — here’s how it works

Systems engineer Krit K developed fast-copy to overcome the performance bottlenecks inherent in traditional Unix file utilities. The tool achieves significant speed gains by resolving physical disk offsets to transform random I/O into sequential reads.

Why This Matters

Traditional file utilities like cp -r read files in directory order, which translates to random disk access on mechanical drives. In high-density file environments, every seek operation costs 5-10ms, causing linear performance degradation that standard tools fail to address. By utilizing low-level system calls like FIEMAP and fcntl, engineers can maximize sequential disk speed and bypass the protocol overhead typically found in SFTP and SCP transfers.

Key Insights

  • Hard drive seek latency of 5-10ms per file creates massive overhead when copying tens of thousands of small files via directory order.
  • Fast-copy utilizes Linux FIEMAP, macOS fcntl, and Windows FSCTL to resolve the physical block positions of files before execution.
  • Deduplication using xxHash-128 saved 378.9 MB of I/O and reduced transfer volume by nearly 50% in a 92K file test case.
  • SSH tar streaming eliminates SFTP protocol overhead by piping chunked ~100 MB batches directly into a remote tar process.
  • A persistent SQLite database of file hashes enables efficient incremental copies by skipping previously verified data.

Working Examples

Basic command to execute a local-to-local file copy using fast-copy.

python fast_copy.py /source /destination

Installation of optional dependencies for SSH transfers and high-performance hashing.

pip install paramiko
pip install xxhash

Practical Applications

  • Use Case: Moving 92K files to a USB drive at 28.5 MB/s using physical disk offset sorting. Pitfall: Using standard cp -r results in excessive head movement and significantly slower completion times.
  • Use Case: Transferring bulk data to a Synology NAS with SFTP disabled by leveraging raw SSH tar streaming. Pitfall: Relying on SCP which may top out at 1-2 MB/s due to protocol overhead.
  • Use Case: Incremental backups of node_modules and developer environments using hard links for deduplication. Pitfall: Redundantly copying identical files across multiple project directories, wasting storage and bandwidth.

References:

Continue reading

Next article

Maximizing AWS Certification ROI: A Solutions Architect's Guide to High-Value Credentials

Related Content