summaryrefslogtreecommitdiffstats
path: root/_log/vcs-1.md
diff options
context:
space:
mode:
authorSadeep Madurange <sadeep@asciimx.com>2026-05-01 00:38:40 +0800
committerSadeep Madurange <sadeep@asciimx.com>2026-05-01 09:25:22 +0800
commit76ae41e3cb961183a98813d593a76e5f5154aad7 (patch)
treee776e5653272d650e95345724f82307715269030 /_log/vcs-1.md
parent5c6536754ef175ac6bbf628de16e9491940a9d48 (diff)
downloadwww-76ae41e3cb961183a98813d593a76e5f5154aad7.tar.gz
Polished VCS prose.
Diffstat (limited to '_log/vcs-1.md')
-rw-r--r--_log/vcs-1.md85
1 files changed, 30 insertions, 55 deletions
diff --git a/_log/vcs-1.md b/_log/vcs-1.md
index 2611432..f7520df 100644
--- a/_log/vcs-1.md
+++ b/_log/vcs-1.md
@@ -1,24 +1,36 @@
---
title: Built and benchmarked Urn against Git
-date: 2026-04-30
+date: 2026-05-01
layout: post
---
-Implemented init, status, add, commit, log, show, and diff. Tracks regular
-files, symlinks. Didn't bother with collaborative workflows.
+Implemented init, status, add, commit, log, show, and diff commands. Depends
+only on OpenBSD base system tools. Didn't bother with collaborative workflows.
-Replaced the initial work tree mirroring with symlinks to a path-sorted index;
-Minimizes inode churn; Avoids walking directories on every command.
+Initial design mirrored the work tree using symlinks. Using filesystem as a
+database felt clever, but walking directories on every command and the inode
+churn were untenable. Replaced the symlink architecture with a path-sorted
+index.
-Index tracks paths, mtimes, sizes, and SHA-1 hashes of staged, committed, and
-base files. Like Git, used mtime and size to skip entries that didn't change.
-Excluded file permissions for now.
+The index tracks path, mtime, size, and SHA-1 hashes of staged, committed, and
+base files. Hashing is skipped when mtime and size are unchanged. If the file
+and the index share the same timestamp, it's rehashed to catch sub-second
+changes.
-Used a two-finger walk with the index to scan work/commit trees. Linear index
-access trades random-access speed for sequential IO; keeps memory footprint
+Implemented directory scans as a two-finger walk with the index. Linear index
+access trades random-access speed for sequential IO and keeps memory footprint
low.
-Operations run in memory, using text streams and pipes wherever possible. Left
+Commits save staged files, trees, and deltas to the content-addressable object
+store. Bundled deltas into tarballs to conserve inodes. Gzipped objects larger
+than 512 bytes. The threshold was arbitrary. Did not tune further.
+
+Deltas, computed using diff, target the original file. Subsequent versions are
+reconstructed via a single patch—no chains. When the delta exceeds the rebase
+threshold, the file becomes the new base. Diff output is bloated but compresses
+well, so rebase threshold is set to 1.4, assuming a 30-40% compression ratio.
+
+Commands run in memory, using text streams and pipes wherever possible. Left
MEM_LIMIT configurable to fall back to disk for large repositories:
```
@@ -42,20 +54,6 @@ if ((!$use_disk && $tot_size > MEM_LIMIT) ||
}
```
-Commits save staged files, trees, and the deltas to the object store. Bundled
-deltas into tarballs to conserve inodes. Gzipped objects larger than 512 bytes
-(length of tar + gzip headers). Object store is content-addressable.
-
-Deltas target the original file (base). Subsequent versions are reconstructed
-via one patch—no chains. When the delta exceeds the rebase threshold, the file
-becomes the new base.
-
-Avoiding frequent rebases is key. Diff output is bloated but compresses well.
-Set rebase threshold to 1.4 expecting 30-40% compression ratio.
-
-Unix diff doesn't compute binary deltas. Rolled a basic binary diff to stay in
-the base system. Works well enough except for small changes that shift bytes.
-
Benchmarked against Git v2.51.0 on a T490 (i7-10510U, OpenBSD 7.8):
<pre class="pre-no-style">
@@ -76,17 +74,6 @@ Inodes | 1300 | 1425
Repo size | 6836 KB | 8296 KB
-------------------------------------------------------------
-SNAPSHOT: Commit #40
--------------------------------------------------------------
-METRIC | URN | GIT
-----------------+----------------------+---------------------
-Time | 0.29s | 0.03s
-Max RSS | 0.02 MB | 0.01 MB
-Page faults | Maj:0 / Min:0 | Maj:0 / Min:0
-Inodes | 1340 | 1566
-Repo size | 7332 KB | 9268 KB
--------------------------------------------------------------
-
SNAPSHOT: Commit #60
-------------------------------------------------------------
METRIC | URN | GIT
@@ -98,17 +85,6 @@ Inodes | 1381 | 1706
Repo size | 7896 KB | 10236 KB
-------------------------------------------------------------
-SNAPSHOT: Commit #80
--------------------------------------------------------------
-METRIC | URN | GIT
-----------------+----------------------+---------------------
-Time | 0.35s | 0.03s
-Max RSS | 0.02 MB | 0.01 MB
-Page faults | Maj:0 / Min:0 | Maj:0 / Min:0
-Inodes | 1421 | 1847
-Repo size | 8456 KB | 11200 KB
--------------------------------------------------------------
-
SNAPSHOT: Commit #100
-------------------------------------------------------------
METRIC | URN | GIT
@@ -131,17 +107,16 @@ TOTAL URN REBASES: 0
Git wins on speed and memory.
-On storage, Urn shows more promise. Git wrote 12 MB to track a 17 MB
-repository; Urn wrote 9 MB. Over 80 commits, Git's inode consumption grew by 562.
-Urn's crept from 1,300 to 1,462.
+On storage, Urn shows promise. Git wrote 12 MB to track a 17 MB repository; Urn
+wrote 9 MB. Over 80 commits, Git's inode consumption grew by 562. Urn's crept
+from 1,300 to 1,462.
-Then happened the GC. Inodes: 41. Space recovered: 8.4 MB.
+Then fell the GC hammer. Inodes: 41. Space recovered: 8.4 MB.
Urn's sequential IO and reduced write frequency are theoretically gentler on
-the NAND gates. Git's dramatic GC pass (12 MB → 3.8 MB) incurs write
-amplification Urn likely avoids.
-
-Precise impact on SSD TBW and write amplification, however, remains unknown.
+NAND. Git's dramatic GC pass (12 MB → 3.8 MB) incurs SSD wear Urn likely
+avoids. Precise impact on TBW and write amplification, however, remains
+unknown.
Commit: <a
href="https://git.asciimx.com/urn/commit/?id=79d9ec2bdef0a82172fa0aa56f12004bef206c04"