summaryrefslogtreecommitdiffstats
path: root/_log
diff options
context:
space:
mode:
authorSadeep Madurange <sadeep@asciimx.com>2026-04-29 17:10:11 +0800
committerSadeep Madurange <sadeep@asciimx.com>2026-04-30 18:40:53 +0800
commit5c6536754ef175ac6bbf628de16e9491940a9d48 (patch)
tree04983275633b5808b2d6609e9620a8419814a8de /_log
parent9d1971da3ece4fe4ec0a795d5ce9b122229e69ab (diff)
downloadwww-5c6536754ef175ac6bbf628de16e9491940a9d48.tar.gz
Fixed benchmarks and improved prose in VCS post.
Diffstat (limited to '_log')
-rw-r--r--_log/vcs-1.md124
1 files changed, 59 insertions, 65 deletions
diff --git a/_log/vcs-1.md b/_log/vcs-1.md
index 3b60ff7..2611432 100644
--- a/_log/vcs-1.md
+++ b/_log/vcs-1.md
@@ -1,23 +1,22 @@
---
-title: Built an experimental SSD-friendly VCS
-date: 2026-04-23
+title: Built and benchmarked Urn against Git
+date: 2026-04-30
layout: post
---
Implemented init, status, add, commit, log, show, and diff. Tracks regular
files, symlinks. Didn't bother with collaborative workflows.
-Moved away from the initial work tree mirroring with symlinks to a path-sorted
-index to minimize inode churn and opening directories on every status/add
-command.
+Replaced the initial work tree mirroring with symlinks to a path-sorted index;
+Minimizes inode churn; Avoids walking directories on every command.
-Implemented the index to track staged, commit, and base SHA-1 hashes, mtime,
-and size. Like git, used mtime and size to skip entries that didn't change.
-Excluded file permissions.
+Index tracks paths, mtimes, sizes, and SHA-1 hashes of staged, committed, and
+base files. Like Git, used mtime and size to skip entries that didn't change.
+Excluded file permissions for now.
-Designed work/commit tree scans around a two-finger walk with the index. Linear
-index access trades random-access speed for sequential IO; keeps memory
-footprint low.
+Used a two-finger walk with the index to scan work/commit trees. Linear index
+access trades random-access speed for sequential IO; keeps memory footprint
+low.
Operations run in memory, using text streams and pipes wherever possible. Left
MEM_LIMIT configurable to fall back to disk for large repositories:
@@ -43,112 +42,107 @@ if ((!$use_disk && $tot_size > MEM_LIMIT) ||
}
```
-Implemented the commit command to atomically save (rename) staged files, the
-tree, and the deltas to the object store. Bundled deltas into tarballs to
-conserve inodes, and gzipped tarballs larger than 512 bytes. Objects in the
-store are content-addressable.
+Commits save staged files, trees, and the deltas to the object store. Bundled
+deltas into tarballs to conserve inodes. Gzipped objects larger than 512 bytes
+(length of tar + gzip headers). Object store is content-addressable.
-Computed deltas against the first version of a file (base) to simplify
-reconstruction via application of a single patch instead of delta chains. When
-the delta outgrows the file, the file becomes the new base.
+Deltas target the original file (base). Subsequent versions are reconstructed
+via one patch—no chains. When the delta exceeds the rebase threshold, the file
+becomes the new base.
-Unix diff doesn't compute binary deltas. Rolled a basic binary diff that works
-well enough except for small changes that shift bytes:
+Avoiding frequent rebases is key. Diff output is bloated but compresses well.
+Set rebase threshold to 1.4 expecting 30-40% compression ratio.
-```
-my $patch = pack("Q", $new_size);
-while (1) {
- my $read_new = sysread($f_new, my $buf_new, $blk_size);
- my $read_old = sysread($f_old, my $buf_old, $blk_size);
- last if !$read_new && !$read_old;
-
- # If blocks differ, record the change
- if (($buf_new // '') ne ($buf_old // '')) {
- # Format: Offset (Q), Length (L), raw data
- $patch .= pack("QL", $offset, length($buf_new)) . $buf_new;
- }
- $offset += $blk_size;
-}
-```
+Unix diff doesn't compute binary deltas. Rolled a basic binary diff to stay in
+the base system. Works well enough except for small changes that shift bytes.
-Benchmarks on T490 (i7-10510U, OpenBSD 7.8) against git v2.51.0:
+Benchmarked against Git v2.51.0 on a T490 (i7-10510U, OpenBSD 7.8):
<pre class="pre-no-style">
=============================================================
- REBASE BENCHMARK: 1000 files (100 commits)
- CONDITIONS: Depth=2, Files Mod=5%, Change=50%
- INITIAL RAW DATA SIZE: 16976 KB
+ COMMIT BENCHMARK: 1000 files (100 commits)
+ CONDITIONS: Depth=2, Files Mod=0.5%, Line Mod=5%
+ INITIAL REPO SIZE: 17332 KB
=============================================================
SNAPSHOT: Commit #20
-------------------------------------------------------------
METRIC | URN | GIT
----------------+----------------------+---------------------
-Time | 0.29s | 0.05s
+Time | 0.29s | 0.03s
Max RSS | 0.02 MB | 0.01 MB
Page faults | Maj:0 / Min:0 | Maj:0 / Min:0
-Inodes | 1578 | 2334
-Repo size | 20404 KB | 19380 KB
+Inodes | 1300 | 1425
+Repo size | 6836 KB | 8296 KB
-------------------------------------------------------------
SNAPSHOT: Commit #40
-------------------------------------------------------------
METRIC | URN | GIT
----------------+----------------------+---------------------
-Time | 0.54s | 0.05s
+Time | 0.29s | 0.03s
Max RSS | 0.02 MB | 0.01 MB
Page faults | Maj:0 / Min:0 | Maj:0 / Min:0
-Inodes | 1607 | 3374
-Repo size | 20520 KB | 23788 KB
+Inodes | 1340 | 1566
+Repo size | 7332 KB | 9268 KB
-------------------------------------------------------------
SNAPSHOT: Commit #60
-------------------------------------------------------------
METRIC | URN | GIT
----------------+----------------------+---------------------
-Time | 0.31s | 0.05s
+Time | 0.35s | 0.03s
Max RSS | 0.02 MB | 0.01 MB
Page faults | Maj:0 / Min:0 | Maj:0 / Min:0
-Inodes | 1635 | 4414
-Repo size | 20632 KB | 28196 KB
+Inodes | 1381 | 1706
+Repo size | 7896 KB | 10236 KB
-------------------------------------------------------------
SNAPSHOT: Commit #80
-------------------------------------------------------------
METRIC | URN | GIT
----------------+----------------------+---------------------
-Time | 0.29s | 0.05s
+Time | 0.35s | 0.03s
Max RSS | 0.02 MB | 0.01 MB
Page faults | Maj:0 / Min:0 | Maj:0 / Min:0
-Inodes | 1664 | 5454
-Repo size | 20748 KB | 32596 KB
+Inodes | 1421 | 1847
+Repo size | 8456 KB | 11200 KB
-------------------------------------------------------------
SNAPSHOT: Commit #100
-------------------------------------------------------------
METRIC | URN | GIT
----------------+----------------------+---------------------
-Time | 0.54s | 0.10s
+Time | 0.35s | 0.03s
Max RSS | 0.02 MB | 0.01 MB
Page faults | Maj:0 / Min:0 | Maj:0 / Min:0
-Inodes | 1693 | 6495
-Repo size | 20864 KB | 37008 KB
+Inodes | 1462 | 1987
+Repo size | 9020 KB | 12168 KB
-------------------------------------------------------------
-TOTAL URN REBASES: 273
+AFTER GIT GC
+-------------------------------------------------------------
+Final Size | 9020 KB | 3812 KB
+Final Inodes | 1462 | 41
+-------------------------------------------------------------
+
+TOTAL URN REBASES: 0
</pre>
-Git wins on speed and memory. On small repositories, Urn is competitive.
+Git wins on speed and memory.
+
+On storage, Urn shows more promise. Git wrote 12 MB to track a 17 MB
+repository; Urn wrote 9 MB. Over 80 commits, Git's inode consumption grew by 562.
+Urn's crept from 1,300 to 1,462.
+
+Then happened the GC. Inodes: 41. Space recovered: 8.4 MB.
-In high-revision workloads with modest per-file churn, Urn beats git on
-storage. Over 80 commits, git wrote 17 MB to track a 17 MB repo. Urn only wrote
-0.5 MB despite 273 rebases. Git's inode count went from 2,334 to 6,495. Urn's
-went from 1,578 to 1,693.
+Urn's sequential IO and reduced write frequency are theoretically gentler on
+the NAND gates. Git's dramatic GC pass (12 MB → 3.8 MB) incurs write
+amplification Urn likely avoids.
-Git GC reclaims inodes, but doesn't save much space. On a 36.6 MB repo, git
-used up 1819 inodes and 59 MB pre-GC. After GC, the inode count dropped to
-1514, but the size only shrank by 6 MB.
+Precise impact on SSD TBW and write amplification, however, remains unknown.
Commit: <a
-href="https://git.asciimx.com/urn/commit/?id=49ae7748e4a95afa1fd9d08f4886952dfc1deca4"
-class="external" target="_blank" rel="noopener noreferrer">49ae774</a>
+href="https://git.asciimx.com/urn/commit/?id=79d9ec2bdef0a82172fa0aa56f12004bef206c04"
+class="external" target="_blank" rel="noopener noreferrer">79d9ec2</a>