From c0e09507547ff80ff93d85989c8954fdcb5292f3 Mon Sep 17 00:00:00 2001 From: Sadeep Madurange Date: Fri, 24 Apr 2026 08:10:24 +0800 Subject: Tightened prose in VCS post. --- _log/vcs-1.md | 173 +++++++++++++--------------------------------------------- 1 file changed, 39 insertions(+), 134 deletions(-) (limited to '_log/vcs-1.md') diff --git a/_log/vcs-1.md b/_log/vcs-1.md index d6ae0b2..3b60ff7 100644 --- a/_log/vcs-1.md +++ b/_log/vcs-1.md @@ -1,5 +1,5 @@ --- -title: Implemented an experimental SSD-friendly VCS +title: Built an experimental SSD-friendly VCS date: 2026-04-23 layout: post --- @@ -7,47 +7,20 @@ layout: post Implemented init, status, add, commit, log, show, and diff. Tracks regular files, symlinks. Didn't bother with collaborative workflows. -Moved away from the initial work tree mirroring with symlinks to an index to -minimize inode churn and opening directories on every status/add command. +Moved away from the initial work tree mirroring with symlinks to a path-sorted +index to minimize inode churn and opening directories on every status/add +command. -Implemented path-sorted index to track staged, commit, and base SHA-1 hashes, -mtime, and size. Like git, mtime and size are used to skip entries that didn't -change. Excluded file permissions. +Implemented the index to track staged, commit, and base SHA-1 hashes, mtime, +and size. Like git, used mtime and size to skip entries that didn't change. +Excluded file permissions. -When a new file is committed, it’s saved in the object store as the base. When -the file changes, diff generates a patch against the base. If the patch is -larger than the file, the file becomes the new base. +Designed work/commit tree scans around a two-finger walk with the index. Linear +index access trades random-access speed for sequential IO; keeps memory +footprint low. -Unix diff doesn't compute binary deltas. Rolled a basic binary diff that works -well enough except for small changes that shift bytes: - -``` -my $patch = pack("Q", $new_size); -while (1) { - my $read_new = sysread($f_new, my $buf_new, $blk_size); - my $read_old = sysread($f_old, my $buf_old, $blk_size); - last if !$read_new && !$read_old; - - # If blocks differ, record the change - if (($buf_new // '') ne ($buf_old // '')) { - # Format: Offset (Q), Length (L), raw data - $patch .= pack("QL", $offset, length($buf_new)) . $buf_new; - } - $offset += $blk_size; -} -``` - -A commit is a tree (list of paths and their base hashes) and a patch set. -Patches are stored as tarballs. Gzipped tarballs larger than 512 bytes. -Objects in the store are content-addressable. To reconstruct a file, look up -the revision file, follow the base hash, and apply the patch. - -Status and add commands scan the work tree, sort entries by path, and perform a -two-finger walk with the index. Linear index access trades random-access speed -for sequential IO. - -Operations are performed in memory—often using text streams and pipes. -MEM_LIMIT can be used to fall back on the disk for large repositories: +Operations run in memory, using text streams and pipes wherever possible. Left +MEM_LIMIT configurable to fall back to disk for large repositories: ``` my $flush = sub { @@ -70,105 +43,37 @@ if ((!$use_disk && $tot_size > MEM_LIMIT) || } ``` -Performed benchmarks on T490 (i7-10510U, OpenBSD 7.8) against git v2.51.0: - -
-=============================================================
- BENCHMARK: 200 files @ 20 depth
-=============================================================
-
-ACTION: Status
--------------------------------------------------------------
-METRIC          | URN                  | GIT                 
-----------------+----------------------+---------------------
-Time            |                0.10s |                0.00s
-Max RSS         |              0.02 MB |              0.00 MB
-Page faults     |        Maj:0 / Min:0 |        Maj:0 / Min:0
-Inodes          |                    6 |                   27
-Repo size       |                20 KB |               116 KB
--------------------------------------------------------------
+Implemented the commit command to atomically save (rename) staged files, the
+tree, and the deltas to the object store. Bundled deltas into tarballs to
+conserve inodes, and gzipped tarballs larger than 512 bytes. Objects in the
+store are content-addressable.
 
-ACTION: Add
--------------------------------------------------------------
-METRIC          | URN                  | GIT                 
-----------------+----------------------+---------------------
-Time            |                0.16s |                0.17s
-Max RSS         |              0.02 MB |              0.00 MB
-Page faults     |        Maj:0 / Min:0 |        Maj:0 / Min:0
-Inodes          |                  225 |                  360
-Repo size       |              3700 KB |              3348 KB
--------------------------------------------------------------
+Computed deltas against the first version of a file (base) to simplify
+reconstruction via application of a single patch instead of delta chains. When
+the delta outgrows the file, the file becomes the new base. 
 
-ACTION: Commit
--------------------------------------------------------------
-METRIC          | URN                  | GIT                 
-----------------+----------------------+---------------------
-Time            |                0.17s |                0.04s
-Max RSS         |              0.02 MB |              0.01 MB
-Page faults     |        Maj:0 / Min:0 |        Maj:0 / Min:0
-Inodes          |                  347 |                  397
-Repo size       |              4212 KB |              3496 KB
--------------------------------------------------------------
-
-ACTION: Status(Clean)
--------------------------------------------------------------
-METRIC          | URN                  | GIT                 
-----------------+----------------------+---------------------
-Time            |                0.10s |                0.01s
-Max RSS         |              0.02 MB |              0.00 MB
-Page faults     |        Maj:0 / Min:0 |        Maj:0 / Min:0
-Inodes          |                  347 |                  397
-Repo size       |              4212 KB |              3496 KB
--------------------------------------------------------------
-
-=============================================================
- BENCHMARK: 5000 files @ 50 depth
-=============================================================
-
-ACTION: Status
--------------------------------------------------------------
-METRIC          | URN                  | GIT                 
-----------------+----------------------+---------------------
-Time            |                0.26s |                0.00s
-Max RSS         |              0.02 MB |              0.00 MB
-Page faults     |        Maj:0 / Min:0 |        Maj:0 / Min:0
-Inodes          |                    6 |                   27
-Repo size       |                20 KB |               116 KB
--------------------------------------------------------------
+Unix diff doesn't compute binary deltas. Rolled a basic binary diff that works
+well enough except for small changes that shift bytes:
 
-ACTION: Add
--------------------------------------------------------------
-METRIC          | URN                  | GIT                 
-----------------+----------------------+---------------------
-Time            |                2.82s |                4.62s
-Max RSS         |              0.02 MB |              0.01 MB
-Page faults     |        Maj:0 / Min:0 |        Maj:0 / Min:0
-Inodes          |                 5055 |                 5284
-Repo size       |             89444 KB |             70360 KB
--------------------------------------------------------------
+```
+my $patch = pack("Q", $new_size); 
+while (1) {
+    my $read_new = sysread($f_new, my $buf_new, $blk_size);
+    my $read_old = sysread($f_old, my $buf_old, $blk_size);
+    last if !$read_new && !$read_old;
 
-ACTION: Commit
--------------------------------------------------------------
-METRIC          | URN                  | GIT                 
-----------------+----------------------+---------------------
-Time            |                1.18s |                0.93s
-Max RSS         |              0.03 MB |              0.01 MB
-Page faults     |        Maj:0 / Min:0 |        Maj:0 / Min:0
-Inodes          |                 5264 |                 5342
-Repo size       |             91620 KB |             70592 KB
--------------------------------------------------------------
+    # If blocks differ, record the change
+    if (($buf_new // '') ne ($buf_old // '')) {
+        # Format: Offset (Q), Length (L), raw data
+        $patch .= pack("QL", $offset, length($buf_new)) . $buf_new;
+    }
+    $offset += $blk_size; 
+}
+```
 
-ACTION: Status (Clean)
--------------------------------------------------------------
-METRIC          | URN                  | GIT                 
-----------------+----------------------+---------------------
-Time            |                0.34s |                0.10s
-Max RSS         |              0.02 MB |              0.01 MB
-Page faults     |        Maj:0 / Min:0 |        Maj:0 / Min:0
-Inodes          |                 5264 |                 5342
-Repo size       |             91620 KB |             70592 KB
--------------------------------------------------------------
+Benchmarks on T490 (i7-10510U, OpenBSD 7.8) against git v2.51.0:
 
+
 =============================================================
  REBASE BENCHMARK: 1000 files (100 commits)
  CONDITIONS: Depth=2, Files Mod=5%, Change=50%
@@ -241,8 +146,8 @@ storage. Over 80 commits, git wrote 17 MB to track a 17 MB repo. Urn only wrote
 went from 1,578 to 1,693.
 
 Git GC reclaims inodes, but doesn't save much space. On a 36.6 MB repo, git
-used up 1819 inodes and 59 MB pre-GC. After GC, inode count dropped to 1514,
-but space only shrank by 6 MB.
+used up 1819 inodes and 59 MB pre-GC. After GC, the inode count dropped to
+1514, but the size only shrank by 6 MB.
 
 Commit: