From 42f6044b0bbcdf92fe2e0667dfc07c2f4cfaed69 Mon Sep 17 00:00:00 2001 From: Sadeep Madurange Date: Thu, 23 Apr 2026 16:01:14 +0800 Subject: Wrote VCS post in log style. --- _log/vcs-1.md | 45 +++++++++++++++++++++++---------------------- 1 file changed, 23 insertions(+), 22 deletions(-) diff --git a/_log/vcs-1.md b/_log/vcs-1.md index 70531bc..c07d3bd 100644 --- a/_log/vcs-1.md +++ b/_log/vcs-1.md @@ -4,23 +4,22 @@ date: 2026-04-23 layout: post --- -Urn is an experimental VCS built to minimize SSD wear, write amplification, and -inode churn, even when that costs CPU time. +Implemented init, status, add, commit, log, show, and diff. Tracks regular +files, symlinks. Didn't bother with collaborative workflows. -Implemented init, status, add, commit, log, show, and diff. Handles text files, -symlinks, and binary files. Collaborative workflows are out of scope. +Moved away from the initial work tree mirroring with symlinks to an index to +minimize inode churn and opening directories on every status/add command. -Sorted index tracks file paths, SHA-1 hashes of staged files (staged hash), -parent (commit hash), and base (base hash), mtime, and size. Permissions are -not tracked. mtime and size are used to avoid computing hashes for files that -didn't change. +Implemented path-sorted index to track staged, commit, and base SHA-1 hashes, +mtime, and size. Like git, mtime and size are used to skip entries that didn't +change. Excluded file permissions. -When a new file is committed, it's saved in the object store as the base. When +When a new file is committed, it’s saved in the object store as the base. When the file changes, diff generates a patch against the base. If the patch is larger than the file, the file becomes the new base. -Unix diff doesn't handle binary files well. Rolled a diff for binary files that -works well enough except for small changes that shift bytes: +Unix diff doesn't compute binary deltas. Rolled a basic binary diff that works +well enough except for small changes that shift bytes: ``` my $patch = pack("Q", $new_size); @@ -38,15 +37,17 @@ while (1) { } ``` -Commits store sorted lists of paths (tree), base hashes, and patch sets in the -object store. Patches are stored as tarballs. Tarballs larger than 512 bytes -are gzipped. Objects are content-addressable. To reconstruct a file, look up -the revision, follow the base hash and apply the patch. +A commit is a tree (list of paths and their base hashes) and a patch set. +Patches are stored as tarballs. Gzipped tarballs larger than 512 bytes. +Objects in the store are content-addressable. To reconstruct a file, look up +the revision file, follow the base hash, and apply the patch. -Status and add commands scan the work tree, sort entries by path and performs a -two-finger walk with the index to minimize random access. Operations are -performed in memory — often using text streams and pipes. MEM_LIMIT can be used -to fall back on the disk for large repositories: +Status and add commands scan the work tree, sort entries by path, and perform a +two-finger walk with the index. Linear index access trades random-access speed +for sequential IO. + +Operations are performed in memory — often using text streams and pipes. +MEM_LIMIT can be used to fall back on the disk for large repositories: ``` my $flush = sub { @@ -69,7 +70,7 @@ if ((!$use_disk && $tot_size > MEM_LIMIT) || } ``` -Benchmarks on T490 (i7-10510U, OpenBSD 7.8) against git v2.51.0: +Performed benchmarks on T490 (i7-10510U, OpenBSD 7.8) against git v2.51.0:
 =============================================================
@@ -240,8 +241,8 @@ storage. Over 80 commits, git wrote 17 MB to track a 17 MB repo. Urn only wrote
 went from 1,578 to 1,693.
 
 Git GC reclaims inodes, but doesn't save much space. On a 36.6 MB repo, git
-used up 1819 inodes and 59 MB pre-GC. After GC inode count dropped to 1514, but
-space only shrank by 6 MB.
+used up 1819 inodes and 59 MB pre-GC. After GC, inode count dropped to 1514,
+but space only shrank by 6 MB.
 
 Commit: