From 76ae41e3cb961183a98813d593a76e5f5154aad7 Mon Sep 17 00:00:00 2001 From: Sadeep Madurange Date: Fri, 1 May 2026 00:38:40 +0800 Subject: Polished VCS prose. --- _log/vcs-1.md | 85 +++++++++++++++++++++-------------------------------------- 1 file changed, 30 insertions(+), 55 deletions(-) diff --git a/_log/vcs-1.md b/_log/vcs-1.md index 2611432..f7520df 100644 --- a/_log/vcs-1.md +++ b/_log/vcs-1.md @@ -1,24 +1,36 @@ --- title: Built and benchmarked Urn against Git -date: 2026-04-30 +date: 2026-05-01 layout: post --- -Implemented init, status, add, commit, log, show, and diff. Tracks regular -files, symlinks. Didn't bother with collaborative workflows. +Implemented init, status, add, commit, log, show, and diff commands. Depends +only on OpenBSD base system tools. Didn't bother with collaborative workflows. -Replaced the initial work tree mirroring with symlinks to a path-sorted index; -Minimizes inode churn; Avoids walking directories on every command. +Initial design mirrored the work tree using symlinks. Using filesystem as a +database felt clever, but walking directories on every command and the inode +churn were untenable. Replaced the symlink architecture with a path-sorted +index. -Index tracks paths, mtimes, sizes, and SHA-1 hashes of staged, committed, and -base files. Like Git, used mtime and size to skip entries that didn't change. -Excluded file permissions for now. +The index tracks path, mtime, size, and SHA-1 hashes of staged, committed, and +base files. Hashing is skipped when mtime and size are unchanged. If the file +and the index share the same timestamp, it's rehashed to catch sub-second +changes. -Used a two-finger walk with the index to scan work/commit trees. Linear index -access trades random-access speed for sequential IO; keeps memory footprint +Implemented directory scans as a two-finger walk with the index. Linear index +access trades random-access speed for sequential IO and keeps memory footprint low. -Operations run in memory, using text streams and pipes wherever possible. Left +Commits save staged files, trees, and deltas to the content-addressable object +store. Bundled deltas into tarballs to conserve inodes. Gzipped objects larger +than 512 bytes. The threshold was arbitrary. Did not tune further. + +Deltas, computed using diff, target the original file. Subsequent versions are +reconstructed via a single patch—no chains. When the delta exceeds the rebase +threshold, the file becomes the new base. Diff output is bloated but compresses +well, so rebase threshold is set to 1.4, assuming a 30-40% compression ratio. + +Commands run in memory, using text streams and pipes wherever possible. Left MEM_LIMIT configurable to fall back to disk for large repositories: ``` @@ -42,20 +54,6 @@ if ((!$use_disk && $tot_size > MEM_LIMIT) || } ``` -Commits save staged files, trees, and the deltas to the object store. Bundled -deltas into tarballs to conserve inodes. Gzipped objects larger than 512 bytes -(length of tar + gzip headers). Object store is content-addressable. - -Deltas target the original file (base). Subsequent versions are reconstructed -via one patch—no chains. When the delta exceeds the rebase threshold, the file -becomes the new base. - -Avoiding frequent rebases is key. Diff output is bloated but compresses well. -Set rebase threshold to 1.4 expecting 30-40% compression ratio. - -Unix diff doesn't compute binary deltas. Rolled a basic binary diff to stay in -the base system. Works well enough except for small changes that shift bytes. - Benchmarked against Git v2.51.0 on a T490 (i7-10510U, OpenBSD 7.8):
@@ -76,17 +74,6 @@ Inodes          |                 1300 |                 1425
 Repo size       |              6836 KB |              8296 KB
 -------------------------------------------------------------
 
-SNAPSHOT: Commit #40
--------------------------------------------------------------
-METRIC          | URN                  | GIT                 
-----------------+----------------------+---------------------
-Time            |                0.29s |                0.03s
-Max RSS         |              0.02 MB |              0.01 MB
-Page faults     |        Maj:0 / Min:0 |        Maj:0 / Min:0
-Inodes          |                 1340 |                 1566
-Repo size       |              7332 KB |              9268 KB
--------------------------------------------------------------
-
 SNAPSHOT: Commit #60
 -------------------------------------------------------------
 METRIC          | URN                  | GIT                 
@@ -98,17 +85,6 @@ Inodes          |                 1381 |                 1706
 Repo size       |              7896 KB |             10236 KB
 -------------------------------------------------------------
 
-SNAPSHOT: Commit #80
--------------------------------------------------------------
-METRIC          | URN                  | GIT                 
-----------------+----------------------+---------------------
-Time            |                0.35s |                0.03s
-Max RSS         |              0.02 MB |              0.01 MB
-Page faults     |        Maj:0 / Min:0 |        Maj:0 / Min:0
-Inodes          |                 1421 |                 1847
-Repo size       |              8456 KB |             11200 KB
--------------------------------------------------------------
-
 SNAPSHOT: Commit #100
 -------------------------------------------------------------
 METRIC          | URN                  | GIT                 
@@ -131,17 +107,16 @@ TOTAL URN REBASES: 0
 
 Git wins on speed and memory. 
 
-On storage, Urn shows more promise. Git wrote 12 MB to track a 17 MB
-repository; Urn wrote 9 MB. Over 80 commits, Git's inode consumption grew by 562. 
-Urn's crept from 1,300 to 1,462.
+On storage, Urn shows promise. Git wrote 12 MB to track a 17 MB repository; Urn
+wrote 9 MB. Over 80 commits, Git's inode consumption grew by 562.  Urn's crept
+from 1,300 to 1,462.
 
-Then happened the GC. Inodes: 41. Space recovered: 8.4 MB.
+Then fell the GC hammer. Inodes: 41. Space recovered: 8.4 MB.
 
 Urn's sequential IO and reduced write frequency are theoretically gentler on
-the NAND gates. Git's dramatic GC pass (12 MB → 3.8 MB) incurs write
-amplification Urn likely avoids. 
-
-Precise impact on SSD TBW and write amplification, however, remains unknown. 
+NAND. Git's dramatic GC pass (12 MB → 3.8 MB) incurs SSD wear Urn likely
+avoids. Precise impact on TBW and write amplification, however, remains
+unknown. 
 
 Commit: