--- title: Built an experimental SSD-friendly VCS date: 2026-04-23 layout: post --- Implemented init, status, add, commit, log, show, and diff. Tracks regular files, symlinks. Didn't bother with collaborative workflows. Moved away from the initial work tree mirroring with symlinks to a path-sorted index to minimize inode churn and opening directories on every status/add command. Implemented the index to track staged, commit, and base SHA-1 hashes, mtime, and size. Like git, used mtime and size to skip entries that didn't change. Excluded file permissions. Designed work/commit tree scans around a two-finger walk with the index. Linear index access trades random-access speed for sequential IO; keeps memory footprint low. Operations run in memory, using text streams and pipes wherever possible. Left MEM_LIMIT configurable to fall back to disk for large repositories: ``` my $flush = sub { if (!$use_disk) { ($tmp_fh, $tmp_path) = tempfile(UNLINK => 1); $tmp_fh->setvbuf(undef, POSIX::_IOFBF(), $chunk_size); binmode $tmp_fh, ":raw"; $use_disk = 1; } print $tmp_fh @buf; }; push @buf, $line; $buf_size += length($line); $tot_size += length($line); if ((!$use_disk && $tot_size > MEM_LIMIT) || ($use_disk && $buf_size > $chunk_size)) { $flush->(); } ``` Implemented the commit command to atomically save (rename) staged files, the tree, and the deltas to the object store. Bundled deltas into tarballs to conserve inodes, and gzipped tarballs larger than 512 bytes. Objects in the store are content-addressable. Computed deltas against the first version of a file (base) to simplify reconstruction via application of a single patch instead of delta chains. When the delta outgrows the file, the file becomes the new base. Unix diff doesn't compute binary deltas. Rolled a basic binary diff that works well enough except for small changes that shift bytes: ``` my $patch = pack("Q", $new_size); while (1) { my $read_new = sysread($f_new, my $buf_new, $blk_size); my $read_old = sysread($f_old, my $buf_old, $blk_size); last if !$read_new && !$read_old; # If blocks differ, record the change if (($buf_new // '') ne ($buf_old // '')) { # Format: Offset (Q), Length (L), raw data $patch .= pack("QL", $offset, length($buf_new)) . $buf_new; } $offset += $blk_size; } ``` Benchmarks on T490 (i7-10510U, OpenBSD 7.8) against git v2.51.0:
============================================================= REBASE BENCHMARK: 1000 files (100 commits) CONDITIONS: Depth=2, Files Mod=5%, Change=50% INITIAL RAW DATA SIZE: 16976 KB ============================================================= SNAPSHOT: Commit #20 ------------------------------------------------------------- METRIC | URN | GIT ----------------+----------------------+--------------------- Time | 0.29s | 0.05s Max RSS | 0.02 MB | 0.01 MB Page faults | Maj:0 / Min:0 | Maj:0 / Min:0 Inodes | 1578 | 2334 Repo size | 20404 KB | 19380 KB ------------------------------------------------------------- SNAPSHOT: Commit #40 ------------------------------------------------------------- METRIC | URN | GIT ----------------+----------------------+--------------------- Time | 0.54s | 0.05s Max RSS | 0.02 MB | 0.01 MB Page faults | Maj:0 / Min:0 | Maj:0 / Min:0 Inodes | 1607 | 3374 Repo size | 20520 KB | 23788 KB ------------------------------------------------------------- SNAPSHOT: Commit #60 ------------------------------------------------------------- METRIC | URN | GIT ----------------+----------------------+--------------------- Time | 0.31s | 0.05s Max RSS | 0.02 MB | 0.01 MB Page faults | Maj:0 / Min:0 | Maj:0 / Min:0 Inodes | 1635 | 4414 Repo size | 20632 KB | 28196 KB ------------------------------------------------------------- SNAPSHOT: Commit #80 ------------------------------------------------------------- METRIC | URN | GIT ----------------+----------------------+--------------------- Time | 0.29s | 0.05s Max RSS | 0.02 MB | 0.01 MB Page faults | Maj:0 / Min:0 | Maj:0 / Min:0 Inodes | 1664 | 5454 Repo size | 20748 KB | 32596 KB ------------------------------------------------------------- SNAPSHOT: Commit #100 ------------------------------------------------------------- METRIC | URN | GIT ----------------+----------------------+--------------------- Time | 0.54s | 0.10s Max RSS | 0.02 MB | 0.01 MB Page faults | Maj:0 / Min:0 | Maj:0 / Min:0 Inodes | 1693 | 6495 Repo size | 20864 KB | 37008 KB ------------------------------------------------------------- TOTAL URN REBASES: 273Git wins on speed and memory. On small repositories, Urn is competitive. In high-revision workloads with modest per-file churn, Urn beats git on storage. Over 80 commits, git wrote 17 MB to track a 17 MB repo. Urn only wrote 0.5 MB despite 273 rebases. Git's inode count went from 2,334 to 6,495. Urn's went from 1,578 to 1,693. Git GC reclaims inodes, but doesn't save much space. On a 36.6 MB repo, git used up 1819 inodes and 59 MB pre-GC. After GC, the inode count dropped to 1514, but the size only shrank by 6 MB. Commit: 49ae774