From 5c6536754ef175ac6bbf628de16e9491940a9d48 Mon Sep 17 00:00:00 2001 From: Sadeep Madurange Date: Wed, 29 Apr 2026 17:10:11 +0800 Subject: Fixed benchmarks and improved prose in VCS post. --- _log/vcs-1.md | 124 ++++++++++++++++++++++++++++------------------------------ 1 file changed, 59 insertions(+), 65 deletions(-) diff --git a/_log/vcs-1.md b/_log/vcs-1.md index 3b60ff7..2611432 100644 --- a/_log/vcs-1.md +++ b/_log/vcs-1.md @@ -1,23 +1,22 @@ --- -title: Built an experimental SSD-friendly VCS -date: 2026-04-23 +title: Built and benchmarked Urn against Git +date: 2026-04-30 layout: post --- Implemented init, status, add, commit, log, show, and diff. Tracks regular files, symlinks. Didn't bother with collaborative workflows. -Moved away from the initial work tree mirroring with symlinks to a path-sorted -index to minimize inode churn and opening directories on every status/add -command. +Replaced the initial work tree mirroring with symlinks to a path-sorted index; +Minimizes inode churn; Avoids walking directories on every command. -Implemented the index to track staged, commit, and base SHA-1 hashes, mtime, -and size. Like git, used mtime and size to skip entries that didn't change. -Excluded file permissions. +Index tracks paths, mtimes, sizes, and SHA-1 hashes of staged, committed, and +base files. Like Git, used mtime and size to skip entries that didn't change. +Excluded file permissions for now. -Designed work/commit tree scans around a two-finger walk with the index. Linear -index access trades random-access speed for sequential IO; keeps memory -footprint low. +Used a two-finger walk with the index to scan work/commit trees. Linear index +access trades random-access speed for sequential IO; keeps memory footprint +low. Operations run in memory, using text streams and pipes wherever possible. Left MEM_LIMIT configurable to fall back to disk for large repositories: @@ -43,112 +42,107 @@ if ((!$use_disk && $tot_size > MEM_LIMIT) || } ``` -Implemented the commit command to atomically save (rename) staged files, the -tree, and the deltas to the object store. Bundled deltas into tarballs to -conserve inodes, and gzipped tarballs larger than 512 bytes. Objects in the -store are content-addressable. +Commits save staged files, trees, and the deltas to the object store. Bundled +deltas into tarballs to conserve inodes. Gzipped objects larger than 512 bytes +(length of tar + gzip headers). Object store is content-addressable. -Computed deltas against the first version of a file (base) to simplify -reconstruction via application of a single patch instead of delta chains. When -the delta outgrows the file, the file becomes the new base. +Deltas target the original file (base). Subsequent versions are reconstructed +via one patch—no chains. When the delta exceeds the rebase threshold, the file +becomes the new base. -Unix diff doesn't compute binary deltas. Rolled a basic binary diff that works -well enough except for small changes that shift bytes: +Avoiding frequent rebases is key. Diff output is bloated but compresses well. +Set rebase threshold to 1.4 expecting 30-40% compression ratio. -``` -my $patch = pack("Q", $new_size); -while (1) { - my $read_new = sysread($f_new, my $buf_new, $blk_size); - my $read_old = sysread($f_old, my $buf_old, $blk_size); - last if !$read_new && !$read_old; - - # If blocks differ, record the change - if (($buf_new // '') ne ($buf_old // '')) { - # Format: Offset (Q), Length (L), raw data - $patch .= pack("QL", $offset, length($buf_new)) . $buf_new; - } - $offset += $blk_size; -} -``` +Unix diff doesn't compute binary deltas. Rolled a basic binary diff to stay in +the base system. Works well enough except for small changes that shift bytes. -Benchmarks on T490 (i7-10510U, OpenBSD 7.8) against git v2.51.0: +Benchmarked against Git v2.51.0 on a T490 (i7-10510U, OpenBSD 7.8):
 =============================================================
- REBASE BENCHMARK: 1000 files (100 commits)
- CONDITIONS: Depth=2, Files Mod=5%, Change=50%
- INITIAL RAW DATA SIZE: 16976 KB
+ COMMIT BENCHMARK: 1000 files (100 commits)
+ CONDITIONS: Depth=2, Files Mod=0.5%, Line Mod=5%
+ INITIAL REPO SIZE: 17332 KB
 =============================================================
 
 SNAPSHOT: Commit #20
 -------------------------------------------------------------
 METRIC          | URN                  | GIT                 
 ----------------+----------------------+---------------------
-Time            |                0.29s |                0.05s
+Time            |                0.29s |                0.03s
 Max RSS         |              0.02 MB |              0.01 MB
 Page faults     |        Maj:0 / Min:0 |        Maj:0 / Min:0
-Inodes          |                 1578 |                 2334
-Repo size       |             20404 KB |             19380 KB
+Inodes          |                 1300 |                 1425
+Repo size       |              6836 KB |              8296 KB
 -------------------------------------------------------------
 
 SNAPSHOT: Commit #40
 -------------------------------------------------------------
 METRIC          | URN                  | GIT                 
 ----------------+----------------------+---------------------
-Time            |                0.54s |                0.05s
+Time            |                0.29s |                0.03s
 Max RSS         |              0.02 MB |              0.01 MB
 Page faults     |        Maj:0 / Min:0 |        Maj:0 / Min:0
-Inodes          |                 1607 |                 3374
-Repo size       |             20520 KB |             23788 KB
+Inodes          |                 1340 |                 1566
+Repo size       |              7332 KB |              9268 KB
 -------------------------------------------------------------
 
 SNAPSHOT: Commit #60
 -------------------------------------------------------------
 METRIC          | URN                  | GIT                 
 ----------------+----------------------+---------------------
-Time            |                0.31s |                0.05s
+Time            |                0.35s |                0.03s
 Max RSS         |              0.02 MB |              0.01 MB
 Page faults     |        Maj:0 / Min:0 |        Maj:0 / Min:0
-Inodes          |                 1635 |                 4414
-Repo size       |             20632 KB |             28196 KB
+Inodes          |                 1381 |                 1706
+Repo size       |              7896 KB |             10236 KB
 -------------------------------------------------------------
 
 SNAPSHOT: Commit #80
 -------------------------------------------------------------
 METRIC          | URN                  | GIT                 
 ----------------+----------------------+---------------------
-Time            |                0.29s |                0.05s
+Time            |                0.35s |                0.03s
 Max RSS         |              0.02 MB |              0.01 MB
 Page faults     |        Maj:0 / Min:0 |        Maj:0 / Min:0
-Inodes          |                 1664 |                 5454
-Repo size       |             20748 KB |             32596 KB
+Inodes          |                 1421 |                 1847
+Repo size       |              8456 KB |             11200 KB
 -------------------------------------------------------------
 
 SNAPSHOT: Commit #100
 -------------------------------------------------------------
 METRIC          | URN                  | GIT                 
 ----------------+----------------------+---------------------
-Time            |                0.54s |                0.10s
+Time            |                0.35s |                0.03s
 Max RSS         |              0.02 MB |              0.01 MB
 Page faults     |        Maj:0 / Min:0 |        Maj:0 / Min:0
-Inodes          |                 1693 |                 6495
-Repo size       |             20864 KB |             37008 KB
+Inodes          |                 1462 |                 1987
+Repo size       |              9020 KB |             12168 KB
 -------------------------------------------------------------
 
-TOTAL URN REBASES: 273
+AFTER GIT GC
+-------------------------------------------------------------
+Final Size      |              9020 KB |              3812 KB
+Final Inodes    |                 1462 |                   41
+-------------------------------------------------------------
+
+TOTAL URN REBASES: 0
 
-Git wins on speed and memory. On small repositories, Urn is competitive. +Git wins on speed and memory. + +On storage, Urn shows more promise. Git wrote 12 MB to track a 17 MB +repository; Urn wrote 9 MB. Over 80 commits, Git's inode consumption grew by 562. +Urn's crept from 1,300 to 1,462. + +Then happened the GC. Inodes: 41. Space recovered: 8.4 MB. -In high-revision workloads with modest per-file churn, Urn beats git on -storage. Over 80 commits, git wrote 17 MB to track a 17 MB repo. Urn only wrote -0.5 MB despite 273 rebases. Git's inode count went from 2,334 to 6,495. Urn's -went from 1,578 to 1,693. +Urn's sequential IO and reduced write frequency are theoretically gentler on +the NAND gates. Git's dramatic GC pass (12 MB → 3.8 MB) incurs write +amplification Urn likely avoids. -Git GC reclaims inodes, but doesn't save much space. On a 36.6 MB repo, git -used up 1819 inodes and 59 MB pre-GC. After GC, the inode count dropped to -1514, but the size only shrank by 6 MB. +Precise impact on SSD TBW and write amplification, however, remains unknown. Commit: 49ae774 +href="https://git.asciimx.com/urn/commit/?id=79d9ec2bdef0a82172fa0aa56f12004bef206c04" +class="external" target="_blank" rel="noopener noreferrer">79d9ec2 -- cgit v1.2.3