summaryrefslogtreecommitdiffstats
path: root/_log
diff options
context:
space:
mode:
Diffstat (limited to '_log')
-rw-r--r--_log/vcs-1.md65
1 files changed, 27 insertions, 38 deletions
diff --git a/_log/vcs-1.md b/_log/vcs-1.md
index ac0cc97..63addda 100644
--- a/_log/vcs-1.md
+++ b/_log/vcs-1.md
@@ -1,19 +1,17 @@
---
-title: 'Urn: Exploring a SSD-friendly VCS architecture'
+title: What would VCS look like if SSD wear was the primary constraint?
date: 2026-04-20
layout: post
---
-Git takes up 1819 inodes and 59 MB to track a 36.59 MB repository pre-GC. GC
-collected 1514 inodes. Packing only reclaimed 6 MB. Can we do better?
+Git: 1819 inodes, 59 MB to track a 36.59 MB repo pre-GC. GC collected 1514
+inodes. Packing reclaimed 6 MB. Can we do better?
-PoC implements status, add, commit, log, show, and diff; Supports symlinks,
-binary files. Architecture tries to balance speed, memory, and storage;
-Prioritizes SSD longevity—sequential read/writes, reduced TBW/WA—to the extent
-Perl and the OS let us.
+PoC implements status, add, commit, log, show, diff. Supports symlinks, binary
+files. Optimized for SSD longevity — sequential reads/writes, reduced TBW/WA.
-A sorted index tracks files. Staging an object copies it to staging area, adds
-path, mtime, size, content hash (SHA-1) to index:
+Architecture: sorted index tracks files. Staging copies object to staging area,
+records path, mtime, size, SHA-1:
```
my $p = $wrk_entry->{path};
@@ -31,24 +29,17 @@ printf $out "%-40s\t%-40s\t%-40s\t%-12d\t%-10d\t%s\n",
$wrk_entry->{size}, $p;
```
-Urn doesn't use delta chains or a CAS. It stores one copy of a file (base).
-Revisions are recorded as patches relative to the base. When the patch outgrows
-the file, urn snapshots that file and tracks future modifications relative to
-the new base.
+No CAS, no delta chains. One base file per tracked object. Revisions are
+patches against the base. When the patch outgrows the file, snapshot and
+rebase. Commits record directory structure and patchset. Patches stored in a
+tarball—one inode per commit. Tarballs >512 bytes gzipped. Trees deduplicated
+by content hash.
-Commits record the directory structure and patchset. Patches are stored in a
-tarball to minimize inode usage. Tarballs larger than 512 bytes are gzipped.
-Compressing anything less isn't worth the effort. Trees are deduplicated using
-the content hash. Multiple commits point to a single tree.
+Diff/show: look up revision, find tree and patchset, apply patch. O(1)
+checkout. Single corrupt patch can't poison history.
-Diff/show commands look up revision, find tree and patchset in the object
-store, and apply the patch. Single-patch model keeps checkout operations at
-O(1) and resists a single corrupt patch corrupting history.
-
-External tools used like sort, diff, patch, tar, gzip are in the base system.
-Pipes and streams are used to communicate with them to keep the memory
-footprint low. Most operations are executed in memory. MEM_LIMIT can be used to
-fallback to disk when working with large repositories:
+External tools: sort, diff, patch, tar, gzip—all base system. Pipes and streams
+throughout. MEM_LIMIT falls back to disk for large repos:
```
use constant MEM_LIMIT => 64 * 1024 * 1024;
@@ -78,8 +69,8 @@ if (!$use_disk) {
}
```
-Index/tree processing is O(N)—two-finger walk. To keep IO transparent, they are
-streamed line-by-line, using carefully calibrated buffers, instead of mmap.
+Index and tree processing: O(N) two-finger walk. Streamed line-by-line,
+calibrated buffers. No mmap.
## Benchmarks
@@ -193,8 +184,7 @@ Repo size | 91620 KB | 70592 KB
### Impact of commits over time
-Each commit modifies 2% of files. Modifications simulate small patches: a few
-lines added, and a few lines deleted.
+Each commit modifies 2% of files — a few lines added, a few deleted.
```
=============================================================
@@ -257,15 +247,14 @@ Repo Size | 19868 KB | 49840 KB
-------------------------------------------------------------
```
-Overall, git has the speed and memory advantage. Curiously though, on a cold
-start, urn's add + commit time beats git's, while git's aggressive zlib
-compression beats urn's disk usage.
+Git wins on speed and memory. Cold start is the exception — urn's add + commit
+beats git there. Git's zlib compression wins on initial disk usage.
-Performance profile over many commits tells a different story, however. Git's
-optimized C core eventually and consistently outperforms urn. Urn's disk usage,
-which is what it was optimized for, is more stable than git's. In 80 commits,
-git wrote 27 MB of data to the disk, while urn only wrote 0.6 MB. Git's inode
-use exploded from 2,122 to 5,341. Urn only went from 1,302 to 1,464.
+Over time the picture flips. 80 commits: git wrote 27 MB, urn wrote 0.6 MB.
+Git's inode count went from 2,122 to 5,341. Urn's went from 1,302 to 1,464. The
+thing it was built to do, it does.
-Verdict: viable.
+Commit: <a
+href="https://git.asciimx.com/urn/commit/?id=57eb41d13914c2fdadcb863d36d73848a5fd589b"
+class="external" target="_blank" rel="noopener noreferrer">57eb41d</a>