summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
-rw-r--r--_log/vcs-1.md45
1 files changed, 23 insertions, 22 deletions
diff --git a/_log/vcs-1.md b/_log/vcs-1.md
index 70531bc..c07d3bd 100644
--- a/_log/vcs-1.md
+++ b/_log/vcs-1.md
@@ -4,23 +4,22 @@ date: 2026-04-23
layout: post
---
-Urn is an experimental VCS built to minimize SSD wear, write amplification, and
-inode churn, even when that costs CPU time.
+Implemented init, status, add, commit, log, show, and diff. Tracks regular
+files, symlinks. Didn't bother with collaborative workflows.
-Implemented init, status, add, commit, log, show, and diff. Handles text files,
-symlinks, and binary files. Collaborative workflows are out of scope.
+Moved away from the initial work tree mirroring with symlinks to an index to
+minimize inode churn and opening directories on every status/add command.
-Sorted index tracks file paths, SHA-1 hashes of staged files (staged hash),
-parent (commit hash), and base (base hash), mtime, and size. Permissions are
-not tracked. mtime and size are used to avoid computing hashes for files that
-didn't change.
+Implemented path-sorted index to track staged, commit, and base SHA-1 hashes,
+mtime, and size. Like git, mtime and size are used to skip entries that didn't
+change. Excluded file permissions.
-When a new file is committed, it's saved in the object store as the base. When
+When a new file is committed, it’s saved in the object store as the base. When
the file changes, diff generates a patch against the base. If the patch is
larger than the file, the file becomes the new base.
-Unix diff doesn't handle binary files well. Rolled a diff for binary files that
-works well enough except for small changes that shift bytes:
+Unix diff doesn't compute binary deltas. Rolled a basic binary diff that works
+well enough except for small changes that shift bytes:
```
my $patch = pack("Q", $new_size);
@@ -38,15 +37,17 @@ while (1) {
}
```
-Commits store sorted lists of paths (tree), base hashes, and patch sets in the
-object store. Patches are stored as tarballs. Tarballs larger than 512 bytes
-are gzipped. Objects are content-addressable. To reconstruct a file, look up
-the revision, follow the base hash and apply the patch.
+A commit is a tree (list of paths and their base hashes) and a patch set.
+Patches are stored as tarballs. Gzipped tarballs larger than 512 bytes.
+Objects in the store are content-addressable. To reconstruct a file, look up
+the revision file, follow the base hash, and apply the patch.
-Status and add commands scan the work tree, sort entries by path and performs a
-two-finger walk with the index to minimize random access. Operations are
-performed in memory — often using text streams and pipes. MEM_LIMIT can be used
-to fall back on the disk for large repositories:
+Status and add commands scan the work tree, sort entries by path, and perform a
+two-finger walk with the index. Linear index access trades random-access speed
+for sequential IO.
+
+Operations are performed in memory — often using text streams and pipes.
+MEM_LIMIT can be used to fall back on the disk for large repositories:
```
my $flush = sub {
@@ -69,7 +70,7 @@ if ((!$use_disk && $tot_size > MEM_LIMIT) ||
}
```
-Benchmarks on T490 (i7-10510U, OpenBSD 7.8) against git v2.51.0:
+Performed benchmarks on T490 (i7-10510U, OpenBSD 7.8) against git v2.51.0:
<pre class="pre-no-style">
=============================================================
@@ -240,8 +241,8 @@ storage. Over 80 commits, git wrote 17 MB to track a 17 MB repo. Urn only wrote
went from 1,578 to 1,693.
Git GC reclaims inodes, but doesn't save much space. On a 36.6 MB repo, git
-used up 1819 inodes and 59 MB pre-GC. After GC inode count dropped to 1514, but
-space only shrank by 6 MB.
+used up 1819 inodes and 59 MB pre-GC. After GC, inode count dropped to 1514,
+but space only shrank by 6 MB.
Commit: <a
href="https://git.asciimx.com/urn/commit/?id=49ae7748e4a95afa1fd9d08f4886952dfc1deca4"