--- title: 'Urn: Exploring a SSD-friendly VCS architecture' date: 2026-04-20 layout: post --- Git takes up 1819 inodes and 59 MB to track a 36.59 MB repository pre-GC. GC collected 1514 inodes. Packing only reclaimed 6 MB. Can we do better? PoC implements status, add, commit, log, show, and diff; Supports symlinks, binary files. Architecture tries to balance speed, memory, and storage; Prioritizes SSD longevity—sequential read/writes, reduced TBW/WA—to the extent Perl and the OS let us. A sorted index tracks files. Staging an object copies it to staging area, adds path, mtime, size, content hash (SHA-1) to index: ``` my $p = $wrk_entry->{path}; my $current_hash = hash_file_content($p); my $stg_path = File::Spec->catfile(TMP_DIR, $p); make_path(dirname($stg_path)); (-l $p) ? symlink(readlink($p), $stg_path) : copy($p, $stg_path); printf $out "%-40s\t%-40s\t%-40s\t%-12d\t%-10d\t%s\n", $current_hash, $idx_entry->{c_hash}, $idx_entry->{b_hash}, $wrk_entry->{mtime}, $wrk_entry->{size}, $p; ``` Urn doesn't use delta chains or a CAS. It stores one copy of a file (base). Revisions are recorded as patches relative to the base. When the patch outgrows the file, urn snapshots that file and tracks future modifications relative to the new base. Commits record the directory structure and patchset. Patches are stored in a tarball to minimize inode usage. Tarballs larger than 512 bytes are gzipped. Compressing anything less isn't worth the effort. Trees are deduplicated using the content hash. Multiple commits point to a single tree. Diff/show commands look up revision, find tree and patchset in the object store, and apply the patch. Single-patch model keeps checkout operations at O(1) and resists a single corrupt patch corrupting history. External tools used like sort, diff, patch, tar, gzip are in the base system. Pipes and streams are used to communicate with them to keep the memory footprint low. Most operations are executed in memory. MEM_LIMIT can be used to fallback to disk when working with large repositories: ``` use constant MEM_LIMIT => 64 * 1024 * 1024; use constant CHUNK_LEN => 8192; use constant IO_LAYER => ":raw:perlio(layer=" . CHUNK_LEN . ")"; if (!$use_disk) { @buf = sort @buf; return sub { my $line = shift @buf; return unless $line; chomp $line; my ($p, $m, $s) = split(/\t/, $line); return { path => $p, mtime => $m, size => $s }; }; } else { $flush->() if @buf; close $tmp_fh; open(my $sort_fh, "-|", "sort", "-t", "\t", "-k1,1", $tmp_path) or die $!; return sub { my $line = <$sort_fh>; unless ($line) { close $sort_fh; return; } chomp $line; my ($p, $s, $m) = split(/\t/, $line); return { path => $p, mtime => $m, size => $s }; }; } ``` Index/tree processing is O(N)—two-finger walk. To keep IO transparent, they are streamed line-by-line, using carefully calibrated buffers, instead of mmap. ## Benchmarks Performed on T490 (i7-10510U, OpenBSD 7.8) against git v2.51.0: ### Impact of repository size Small repo: ``` ============================================================= BENCHMARK: 200 files @ 20 depth ============================================================= ACTION: Status ------------------------------------------------------------- METRIC | URN | GIT ----------------+----------------------+--------------------- Time | 0.10s | 0.00s Max RSS | 0.02 MB | 0.00 MB Page faults | Maj:0 / Min:0 | Maj:0 / Min:0 Inodes | 6 | 27 Repo size | 20 KB | 116 KB ------------------------------------------------------------- ACTION: Add ------------------------------------------------------------- METRIC | URN | GIT ----------------+----------------------+--------------------- Time | 0.16s | 0.17s Max RSS | 0.02 MB | 0.00 MB Page faults | Maj:0 / Min:0 | Maj:0 / Min:0 Inodes | 225 | 360 Repo size | 3700 KB | 3348 KB ------------------------------------------------------------- ACTION: Commit ------------------------------------------------------------- METRIC | URN | GIT ----------------+----------------------+--------------------- Time | 0.17s | 0.04s Max RSS | 0.02 MB | 0.01 MB Page faults | Maj:0 / Min:0 | Maj:0 / Min:0 Inodes | 347 | 397 Repo size | 4212 KB | 3496 KB ------------------------------------------------------------- ACTION: Status(Clean) ------------------------------------------------------------- METRIC | URN | GIT ----------------+----------------------+--------------------- Time | 0.10s | 0.01s Max RSS | 0.02 MB | 0.00 MB Page faults | Maj:0 / Min:0 | Maj:0 / Min:0 Inodes | 347 | 397 Repo size | 4212 KB | 3496 KB ------------------------------------------------------------- ``` Larger repo: ``` ============================================================= BENCHMARK: 5000 files @ 50 depth ============================================================= ACTION: Status ------------------------------------------------------------- METRIC | URN | GIT ----------------+----------------------+--------------------- Time | 0.26s | 0.00s Max RSS | 0.02 MB | 0.00 MB Page faults | Maj:0 / Min:0 | Maj:0 / Min:0 Inodes | 6 | 27 Repo size | 20 KB | 116 KB ------------------------------------------------------------- ACTION: Add ------------------------------------------------------------- METRIC | URN | GIT ----------------+----------------------+--------------------- Time | 2.82s | 4.62s Max RSS | 0.02 MB | 0.01 MB Page faults | Maj:0 / Min:0 | Maj:0 / Min:0 Inodes | 5055 | 5284 Repo size | 89444 KB | 70360 KB ------------------------------------------------------------- ACTION: Commit ------------------------------------------------------------- METRIC | URN | GIT ----------------+----------------------+--------------------- Time | 1.18s | 0.93s Max RSS | 0.03 MB | 0.01 MB Page faults | Maj:0 / Min:0 | Maj:0 / Min:0 Inodes | 5264 | 5342 Repo size | 91620 KB | 70592 KB ------------------------------------------------------------- ACTION: Status (Clean) ------------------------------------------------------------- METRIC | URN | GIT ----------------+----------------------+--------------------- Time | 0.34s | 0.10s Max RSS | 0.02 MB | 0.01 MB Page faults | Maj:0 / Min:0 | Maj:0 / Min:0 Inodes | 5264 | 5342 Repo size | 91620 KB | 70592 KB ------------------------------------------------------------- ``` ### Impact of commits over time Each commit modifies 2% of files. Modifications simulate small patches: a few lines added, and a few lines deleted. ``` ============================================================= HISTORY BENCHMARK: 1000 files (100 commits) ============================================================= SNAPSHOT: Commit #20 ------------------------------------------------------------- METRIC | URN | GIT ----------------+----------------------+--------------------- Time | 0.34s | 0.18s Max RSS | 0.02 MB | 0.01 MB Page Faults | Maj:0/Min:0 | Maj:0/Min:0 Inodes | 1302 | 2122 Repo Size | 19220 KB | 21944 KB ------------------------------------------------------------- SNAPSHOT: Commit #40 ------------------------------------------------------------- METRIC | URN | GIT ----------------+----------------------+--------------------- Time | 0.41s | 0.11s Max RSS | 0.02 MB | 0.01 MB Page Faults | Maj:0/Min:0 | Maj:0/Min:0 Inodes | 1342 | 2924 Repo Size | 19380 KB | 28848 KB ------------------------------------------------------------- SNAPSHOT: Commit #60 ------------------------------------------------------------- METRIC | URN | GIT ----------------+----------------------+--------------------- Time | 0.41s | 0.09s Max RSS | 0.02 MB | 0.01 MB Page Faults | Maj:0/Min:0 | Maj:0/Min:0 Inodes | 1383 | 3719 Repo Size | 19544 KB | 35796 KB ------------------------------------------------------------- SNAPSHOT: Commit #80 ------------------------------------------------------------- METRIC | URN | GIT ----------------+----------------------+--------------------- Time | 0.42s | 0.11s Max RSS | 0.02 MB | 0.01 MB Page Faults | Maj:0/Min:0 | Maj:0/Min:0 Inodes | 1424 | 4532 Repo Size | 19708 KB | 42868 KB ------------------------------------------------------------- SNAPSHOT: Commit #100 ------------------------------------------------------------- METRIC | URN | GIT ----------------+----------------------+--------------------- Time | 0.40s | 0.10s Max RSS | 0.02 MB | 0.01 MB Page Faults | Maj:0/Min:0 | Maj:0/Min:0 Inodes | 1464 | 5341 Repo Size | 19868 KB | 49840 KB ------------------------------------------------------------- ``` Overall, git has the speed and memory advantage. Curiously though, on a cold start, urn's add + commit time beats git's, while git's aggressive zlib compression beats urn's disk usage. Performance profile over many commits tells a different story, however. Git's optimized C core eventually and consistently outperforms urn. Urn's disk usage, which is what it was optimized for, is more stable than git's. In 80 commits, git wrote 27 MB of data to the disk, while urn only wrote 0.6 MB. Git's inode use exploded from 2,122 to 5,341. Urn only went from 1,302 to 1,464. Verdict: viable.