summaryrefslogtreecommitdiffstats
path: root/_log
diff options
context:
space:
mode:
authorSadeep Madurange <sadeep@asciimx.com>2026-04-24 08:10:24 +0800
committerSadeep Madurange <sadeep@asciimx.com>2026-04-24 09:13:15 +0800
commitc0e09507547ff80ff93d85989c8954fdcb5292f3 (patch)
treed77d99af23736e6105a1731a7d37b187d6c2173a /_log
parentbc38c10bfdd23fd51c7c3c320de1d84ac98b8dfd (diff)
downloadwww-c0e09507547ff80ff93d85989c8954fdcb5292f3.tar.gz
Tightened prose in VCS post.
Diffstat (limited to '_log')
-rw-r--r--_log/site-search.md5
-rw-r--r--_log/vcs-1.md173
2 files changed, 42 insertions, 136 deletions
diff --git a/_log/site-search.md b/_log/site-search.md
index 50de7b2..4f965f9 100644
--- a/_log/site-search.md
+++ b/_log/site-search.md
@@ -5,7 +5,7 @@ layout: post
---
Developed a suffix-array-based search engine for the site today. While a simple
-regex search was enough, I couldn't resist the technical elegance of a proper
+regex search was enough, couldn't resist the technical elegance of a proper
index.
Indexer: Implemented the indexer in Perl to crawl the HTML, lowercase the text,
@@ -97,7 +97,8 @@ using lock-file semaphores, and capped the query length (64 B) and the result
set (20). Mitigated XSS by HTML-escaping all output using HTML::Escape.
At my current pace of twelve articles a year, the search should work without
-intervention for the next 833 years. Penciled in the next release for 2859 AD.
+intervention for the next 833 years. Penciled in the next release for Anno
+Domini 2859.
Commit: <a
href="https://git.asciimx.com/www/commit/?h=term&id=6da102d6e0494a3eac3f05fa3b2cdcc25ba2754e"
diff --git a/_log/vcs-1.md b/_log/vcs-1.md
index d6ae0b2..3b60ff7 100644
--- a/_log/vcs-1.md
+++ b/_log/vcs-1.md
@@ -1,5 +1,5 @@
---
-title: Implemented an experimental SSD-friendly VCS
+title: Built an experimental SSD-friendly VCS
date: 2026-04-23
layout: post
---
@@ -7,47 +7,20 @@ layout: post
Implemented init, status, add, commit, log, show, and diff. Tracks regular
files, symlinks. Didn't bother with collaborative workflows.
-Moved away from the initial work tree mirroring with symlinks to an index to
-minimize inode churn and opening directories on every status/add command.
+Moved away from the initial work tree mirroring with symlinks to a path-sorted
+index to minimize inode churn and opening directories on every status/add
+command.
-Implemented path-sorted index to track staged, commit, and base SHA-1 hashes,
-mtime, and size. Like git, mtime and size are used to skip entries that didn't
-change. Excluded file permissions.
+Implemented the index to track staged, commit, and base SHA-1 hashes, mtime,
+and size. Like git, used mtime and size to skip entries that didn't change.
+Excluded file permissions.
-When a new file is committed, it’s saved in the object store as the base. When
-the file changes, diff generates a patch against the base. If the patch is
-larger than the file, the file becomes the new base.
+Designed work/commit tree scans around a two-finger walk with the index. Linear
+index access trades random-access speed for sequential IO; keeps memory
+footprint low.
-Unix diff doesn't compute binary deltas. Rolled a basic binary diff that works
-well enough except for small changes that shift bytes:
-
-```
-my $patch = pack("Q", $new_size);
-while (1) {
- my $read_new = sysread($f_new, my $buf_new, $blk_size);
- my $read_old = sysread($f_old, my $buf_old, $blk_size);
- last if !$read_new && !$read_old;
-
- # If blocks differ, record the change
- if (($buf_new // '') ne ($buf_old // '')) {
- # Format: Offset (Q), Length (L), raw data
- $patch .= pack("QL", $offset, length($buf_new)) . $buf_new;
- }
- $offset += $blk_size;
-}
-```
-
-A commit is a tree (list of paths and their base hashes) and a patch set.
-Patches are stored as tarballs. Gzipped tarballs larger than 512 bytes.
-Objects in the store are content-addressable. To reconstruct a file, look up
-the revision file, follow the base hash, and apply the patch.
-
-Status and add commands scan the work tree, sort entries by path, and perform a
-two-finger walk with the index. Linear index access trades random-access speed
-for sequential IO.
-
-Operations are performed in memory—often using text streams and pipes.
-MEM_LIMIT can be used to fall back on the disk for large repositories:
+Operations run in memory, using text streams and pipes wherever possible. Left
+MEM_LIMIT configurable to fall back to disk for large repositories:
```
my $flush = sub {
@@ -70,105 +43,37 @@ if ((!$use_disk && $tot_size > MEM_LIMIT) ||
}
```
-Performed benchmarks on T490 (i7-10510U, OpenBSD 7.8) against git v2.51.0:
-
-<pre class="pre-no-style">
-=============================================================
- BENCHMARK: 200 files @ 20 depth
-=============================================================
-
-ACTION: Status
--------------------------------------------------------------
-METRIC | URN | GIT
-----------------+----------------------+---------------------
-Time | 0.10s | 0.00s
-Max RSS | 0.02 MB | 0.00 MB
-Page faults | Maj:0 / Min:0 | Maj:0 / Min:0
-Inodes | 6 | 27
-Repo size | 20 KB | 116 KB
--------------------------------------------------------------
+Implemented the commit command to atomically save (rename) staged files, the
+tree, and the deltas to the object store. Bundled deltas into tarballs to
+conserve inodes, and gzipped tarballs larger than 512 bytes. Objects in the
+store are content-addressable.
-ACTION: Add
--------------------------------------------------------------
-METRIC | URN | GIT
-----------------+----------------------+---------------------
-Time | 0.16s | 0.17s
-Max RSS | 0.02 MB | 0.00 MB
-Page faults | Maj:0 / Min:0 | Maj:0 / Min:0
-Inodes | 225 | 360
-Repo size | 3700 KB | 3348 KB
--------------------------------------------------------------
+Computed deltas against the first version of a file (base) to simplify
+reconstruction via application of a single patch instead of delta chains. When
+the delta outgrows the file, the file becomes the new base.
-ACTION: Commit
--------------------------------------------------------------
-METRIC | URN | GIT
-----------------+----------------------+---------------------
-Time | 0.17s | 0.04s
-Max RSS | 0.02 MB | 0.01 MB
-Page faults | Maj:0 / Min:0 | Maj:0 / Min:0
-Inodes | 347 | 397
-Repo size | 4212 KB | 3496 KB
--------------------------------------------------------------
-
-ACTION: Status(Clean)
--------------------------------------------------------------
-METRIC | URN | GIT
-----------------+----------------------+---------------------
-Time | 0.10s | 0.01s
-Max RSS | 0.02 MB | 0.00 MB
-Page faults | Maj:0 / Min:0 | Maj:0 / Min:0
-Inodes | 347 | 397
-Repo size | 4212 KB | 3496 KB
--------------------------------------------------------------
-
-=============================================================
- BENCHMARK: 5000 files @ 50 depth
-=============================================================
-
-ACTION: Status
--------------------------------------------------------------
-METRIC | URN | GIT
-----------------+----------------------+---------------------
-Time | 0.26s | 0.00s
-Max RSS | 0.02 MB | 0.00 MB
-Page faults | Maj:0 / Min:0 | Maj:0 / Min:0
-Inodes | 6 | 27
-Repo size | 20 KB | 116 KB
--------------------------------------------------------------
+Unix diff doesn't compute binary deltas. Rolled a basic binary diff that works
+well enough except for small changes that shift bytes:
-ACTION: Add
--------------------------------------------------------------
-METRIC | URN | GIT
-----------------+----------------------+---------------------
-Time | 2.82s | 4.62s
-Max RSS | 0.02 MB | 0.01 MB
-Page faults | Maj:0 / Min:0 | Maj:0 / Min:0
-Inodes | 5055 | 5284
-Repo size | 89444 KB | 70360 KB
--------------------------------------------------------------
+```
+my $patch = pack("Q", $new_size);
+while (1) {
+ my $read_new = sysread($f_new, my $buf_new, $blk_size);
+ my $read_old = sysread($f_old, my $buf_old, $blk_size);
+ last if !$read_new && !$read_old;
-ACTION: Commit
--------------------------------------------------------------
-METRIC | URN | GIT
-----------------+----------------------+---------------------
-Time | 1.18s | 0.93s
-Max RSS | 0.03 MB | 0.01 MB
-Page faults | Maj:0 / Min:0 | Maj:0 / Min:0
-Inodes | 5264 | 5342
-Repo size | 91620 KB | 70592 KB
--------------------------------------------------------------
+ # If blocks differ, record the change
+ if (($buf_new // '') ne ($buf_old // '')) {
+ # Format: Offset (Q), Length (L), raw data
+ $patch .= pack("QL", $offset, length($buf_new)) . $buf_new;
+ }
+ $offset += $blk_size;
+}
+```
-ACTION: Status (Clean)
--------------------------------------------------------------
-METRIC | URN | GIT
-----------------+----------------------+---------------------
-Time | 0.34s | 0.10s
-Max RSS | 0.02 MB | 0.01 MB
-Page faults | Maj:0 / Min:0 | Maj:0 / Min:0
-Inodes | 5264 | 5342
-Repo size | 91620 KB | 70592 KB
--------------------------------------------------------------
+Benchmarks on T490 (i7-10510U, OpenBSD 7.8) against git v2.51.0:
+<pre class="pre-no-style">
=============================================================
REBASE BENCHMARK: 1000 files (100 commits)
CONDITIONS: Depth=2, Files Mod=5%, Change=50%
@@ -241,8 +146,8 @@ storage. Over 80 commits, git wrote 17 MB to track a 17 MB repo. Urn only wrote
went from 1,578 to 1,693.
Git GC reclaims inodes, but doesn't save much space. On a 36.6 MB repo, git
-used up 1819 inodes and 59 MB pre-GC. After GC, inode count dropped to 1514,
-but space only shrank by 6 MB.
+used up 1819 inodes and 59 MB pre-GC. After GC, the inode count dropped to
+1514, but the size only shrank by 6 MB.
Commit: <a
href="https://git.asciimx.com/urn/commit/?id=49ae7748e4a95afa1fd9d08f4886952dfc1deca4"