--- title: Built and benchmarked Urn against Git date: 2026-04-30 layout: post --- Implemented init, status, add, commit, log, show, and diff. Tracks regular files, symlinks. Didn't bother with collaborative workflows. Replaced the initial work tree mirroring with symlinks to a path-sorted index; Minimizes inode churn; Avoids walking directories on every command. Index tracks paths, mtimes, sizes, and SHA-1 hashes of staged, committed, and base files. Like Git, used mtime and size to skip entries that didn't change. Excluded file permissions for now. Used a two-finger walk with the index to scan work/commit trees. Linear index access trades random-access speed for sequential IO; keeps memory footprint low. Operations run in memory, using text streams and pipes wherever possible. Left MEM_LIMIT configurable to fall back to disk for large repositories: ``` my $flush = sub { if (!$use_disk) { ($tmp_fh, $tmp_path) = tempfile(UNLINK => 1); $tmp_fh->setvbuf(undef, POSIX::_IOFBF(), $chunk_size); binmode $tmp_fh, ":raw"; $use_disk = 1; } print $tmp_fh @buf; }; push @buf, $line; $buf_size += length($line); $tot_size += length($line); if ((!$use_disk && $tot_size > MEM_LIMIT) || ($use_disk && $buf_size > $chunk_size)) { $flush->(); } ``` Commits save staged files, trees, and the deltas to the object store. Bundled deltas into tarballs to conserve inodes. Gzipped objects larger than 512 bytes (length of tar + gzip headers). Object store is content-addressable. Deltas target the original file (base). Subsequent versions are reconstructed via one patch—no chains. When the delta exceeds the rebase threshold, the file becomes the new base. Avoiding frequent rebases is key. Diff output is bloated but compresses well. Set rebase threshold to 1.4 expecting 30-40% compression ratio. Unix diff doesn't compute binary deltas. Rolled a basic binary diff to stay in the base system. Works well enough except for small changes that shift bytes. Benchmarked against Git v2.51.0 on a T490 (i7-10510U, OpenBSD 7.8):
=============================================================
 COMMIT BENCHMARK: 1000 files (100 commits)
 CONDITIONS: Depth=2, Files Mod=0.5%, Line Mod=5%
 INITIAL REPO SIZE: 17332 KB
=============================================================

SNAPSHOT: Commit #20
-------------------------------------------------------------
METRIC          | URN                  | GIT                 
----------------+----------------------+---------------------
Time            |                0.29s |                0.03s
Max RSS         |              0.02 MB |              0.01 MB
Page faults     |        Maj:0 / Min:0 |        Maj:0 / Min:0
Inodes          |                 1300 |                 1425
Repo size       |              6836 KB |              8296 KB
-------------------------------------------------------------

SNAPSHOT: Commit #40
-------------------------------------------------------------
METRIC          | URN                  | GIT                 
----------------+----------------------+---------------------
Time            |                0.29s |                0.03s
Max RSS         |              0.02 MB |              0.01 MB
Page faults     |        Maj:0 / Min:0 |        Maj:0 / Min:0
Inodes          |                 1340 |                 1566
Repo size       |              7332 KB |              9268 KB
-------------------------------------------------------------

SNAPSHOT: Commit #60
-------------------------------------------------------------
METRIC          | URN                  | GIT                 
----------------+----------------------+---------------------
Time            |                0.35s |                0.03s
Max RSS         |              0.02 MB |              0.01 MB
Page faults     |        Maj:0 / Min:0 |        Maj:0 / Min:0
Inodes          |                 1381 |                 1706
Repo size       |              7896 KB |             10236 KB
-------------------------------------------------------------

SNAPSHOT: Commit #80
-------------------------------------------------------------
METRIC          | URN                  | GIT                 
----------------+----------------------+---------------------
Time            |                0.35s |                0.03s
Max RSS         |              0.02 MB |              0.01 MB
Page faults     |        Maj:0 / Min:0 |        Maj:0 / Min:0
Inodes          |                 1421 |                 1847
Repo size       |              8456 KB |             11200 KB
-------------------------------------------------------------

SNAPSHOT: Commit #100
-------------------------------------------------------------
METRIC          | URN                  | GIT                 
----------------+----------------------+---------------------
Time            |                0.35s |                0.03s
Max RSS         |              0.02 MB |              0.01 MB
Page faults     |        Maj:0 / Min:0 |        Maj:0 / Min:0
Inodes          |                 1462 |                 1987
Repo size       |              9020 KB |             12168 KB
-------------------------------------------------------------

AFTER GIT GC
-------------------------------------------------------------
Final Size      |              9020 KB |              3812 KB
Final Inodes    |                 1462 |                   41
-------------------------------------------------------------

TOTAL URN REBASES: 0
Git wins on speed and memory. On storage, Urn shows more promise. Git wrote 12 MB to track a 17 MB repository; Urn wrote 9 MB. Over 80 commits, Git's inode consumption grew by 562. Urn's crept from 1,300 to 1,462. Then happened the GC. Inodes: 41. Space recovered: 8.4 MB. Urn's sequential IO and reduced write frequency are theoretically gentler on the NAND gates. Git's dramatic GC pass (12 MB → 3.8 MB) incurs write amplification Urn likely avoids. Precise impact on SSD TBW and write amplification, however, remains unknown. Commit: 79d9ec2