diff options
| -rw-r--r-- | _log/vcs-1.md | 85 |
1 files changed, 30 insertions, 55 deletions
diff --git a/_log/vcs-1.md b/_log/vcs-1.md index 2611432..f7520df 100644 --- a/_log/vcs-1.md +++ b/_log/vcs-1.md @@ -1,24 +1,36 @@ --- title: Built and benchmarked Urn against Git -date: 2026-04-30 +date: 2026-05-01 layout: post --- -Implemented init, status, add, commit, log, show, and diff. Tracks regular -files, symlinks. Didn't bother with collaborative workflows. +Implemented init, status, add, commit, log, show, and diff commands. Depends +only on OpenBSD base system tools. Didn't bother with collaborative workflows. -Replaced the initial work tree mirroring with symlinks to a path-sorted index; -Minimizes inode churn; Avoids walking directories on every command. +Initial design mirrored the work tree using symlinks. Using filesystem as a +database felt clever, but walking directories on every command and the inode +churn were untenable. Replaced the symlink architecture with a path-sorted +index. -Index tracks paths, mtimes, sizes, and SHA-1 hashes of staged, committed, and -base files. Like Git, used mtime and size to skip entries that didn't change. -Excluded file permissions for now. +The index tracks path, mtime, size, and SHA-1 hashes of staged, committed, and +base files. Hashing is skipped when mtime and size are unchanged. If the file +and the index share the same timestamp, it's rehashed to catch sub-second +changes. -Used a two-finger walk with the index to scan work/commit trees. Linear index -access trades random-access speed for sequential IO; keeps memory footprint +Implemented directory scans as a two-finger walk with the index. Linear index +access trades random-access speed for sequential IO and keeps memory footprint low. -Operations run in memory, using text streams and pipes wherever possible. Left +Commits save staged files, trees, and deltas to the content-addressable object +store. Bundled deltas into tarballs to conserve inodes. Gzipped objects larger +than 512 bytes. The threshold was arbitrary. Did not tune further. + +Deltas, computed using diff, target the original file. Subsequent versions are +reconstructed via a single patch—no chains. When the delta exceeds the rebase +threshold, the file becomes the new base. Diff output is bloated but compresses +well, so rebase threshold is set to 1.4, assuming a 30-40% compression ratio. + +Commands run in memory, using text streams and pipes wherever possible. Left MEM_LIMIT configurable to fall back to disk for large repositories: ``` @@ -42,20 +54,6 @@ if ((!$use_disk && $tot_size > MEM_LIMIT) || } ``` -Commits save staged files, trees, and the deltas to the object store. Bundled -deltas into tarballs to conserve inodes. Gzipped objects larger than 512 bytes -(length of tar + gzip headers). Object store is content-addressable. - -Deltas target the original file (base). Subsequent versions are reconstructed -via one patch—no chains. When the delta exceeds the rebase threshold, the file -becomes the new base. - -Avoiding frequent rebases is key. Diff output is bloated but compresses well. -Set rebase threshold to 1.4 expecting 30-40% compression ratio. - -Unix diff doesn't compute binary deltas. Rolled a basic binary diff to stay in -the base system. Works well enough except for small changes that shift bytes. - Benchmarked against Git v2.51.0 on a T490 (i7-10510U, OpenBSD 7.8): <pre class="pre-no-style"> @@ -76,17 +74,6 @@ Inodes | 1300 | 1425 Repo size | 6836 KB | 8296 KB ------------------------------------------------------------- -SNAPSHOT: Commit #40 -------------------------------------------------------------- -METRIC | URN | GIT -----------------+----------------------+--------------------- -Time | 0.29s | 0.03s -Max RSS | 0.02 MB | 0.01 MB -Page faults | Maj:0 / Min:0 | Maj:0 / Min:0 -Inodes | 1340 | 1566 -Repo size | 7332 KB | 9268 KB -------------------------------------------------------------- - SNAPSHOT: Commit #60 ------------------------------------------------------------- METRIC | URN | GIT @@ -98,17 +85,6 @@ Inodes | 1381 | 1706 Repo size | 7896 KB | 10236 KB ------------------------------------------------------------- -SNAPSHOT: Commit #80 -------------------------------------------------------------- -METRIC | URN | GIT -----------------+----------------------+--------------------- -Time | 0.35s | 0.03s -Max RSS | 0.02 MB | 0.01 MB -Page faults | Maj:0 / Min:0 | Maj:0 / Min:0 -Inodes | 1421 | 1847 -Repo size | 8456 KB | 11200 KB -------------------------------------------------------------- - SNAPSHOT: Commit #100 ------------------------------------------------------------- METRIC | URN | GIT @@ -131,17 +107,16 @@ TOTAL URN REBASES: 0 Git wins on speed and memory. -On storage, Urn shows more promise. Git wrote 12 MB to track a 17 MB -repository; Urn wrote 9 MB. Over 80 commits, Git's inode consumption grew by 562. -Urn's crept from 1,300 to 1,462. +On storage, Urn shows promise. Git wrote 12 MB to track a 17 MB repository; Urn +wrote 9 MB. Over 80 commits, Git's inode consumption grew by 562. Urn's crept +from 1,300 to 1,462. -Then happened the GC. Inodes: 41. Space recovered: 8.4 MB. +Then fell the GC hammer. Inodes: 41. Space recovered: 8.4 MB. Urn's sequential IO and reduced write frequency are theoretically gentler on -the NAND gates. Git's dramatic GC pass (12 MB → 3.8 MB) incurs write -amplification Urn likely avoids. - -Precise impact on SSD TBW and write amplification, however, remains unknown. +NAND. Git's dramatic GC pass (12 MB → 3.8 MB) incurs SSD wear Urn likely +avoids. Precise impact on TBW and write amplification, however, remains +unknown. Commit: <a href="https://git.asciimx.com/urn/commit/?id=79d9ec2bdef0a82172fa0aa56f12004bef206c04" |
