summaryrefslogtreecommitdiffstats
path: root/_log
diff options
context:
space:
mode:
Diffstat (limited to '_log')
-rw-r--r--_log/arduino-due.md2
-rw-r--r--_log/site-search.md2
-rw-r--r--_log/vcs-1.md271
3 files changed, 273 insertions, 2 deletions
diff --git a/_log/arduino-due.md b/_log/arduino-due.md
index 1881547..900098d 100644
--- a/_log/arduino-due.md
+++ b/_log/arduino-due.md
@@ -1,5 +1,5 @@
---
-title: ATSAM3X8E bare-metal notes
+title: Bare-metal ATSAM3X8E
date: 2024-09-16
layout: post
---
diff --git a/_log/site-search.md b/_log/site-search.md
index 20c66de..b0b1d32 100644
--- a/_log/site-search.md
+++ b/_log/site-search.md
@@ -1,5 +1,5 @@
---
-title: Suffix-array search for static sites
+title: Search engine for static sites
date: 2026-01-03
layout: post
---
diff --git a/_log/vcs-1.md b/_log/vcs-1.md
new file mode 100644
index 0000000..ac0cc97
--- /dev/null
+++ b/_log/vcs-1.md
@@ -0,0 +1,271 @@
+---
+title: 'Urn: Exploring a SSD-friendly VCS architecture'
+date: 2026-04-20
+layout: post
+---
+
+Git takes up 1819 inodes and 59 MB to track a 36.59 MB repository pre-GC. GC
+collected 1514 inodes. Packing only reclaimed 6 MB. Can we do better?
+
+PoC implements status, add, commit, log, show, and diff; Supports symlinks,
+binary files. Architecture tries to balance speed, memory, and storage;
+Prioritizes SSD longevity—sequential read/writes, reduced TBW/WA—to the extent
+Perl and the OS let us.
+
+A sorted index tracks files. Staging an object copies it to staging area, adds
+path, mtime, size, content hash (SHA-1) to index:
+
+```
+my $p = $wrk_entry->{path};
+my $current_hash = hash_file_content($p);
+my $stg_path = File::Spec->catfile(TMP_DIR, $p);
+make_path(dirname($stg_path));
+
+(-l $p)
+ ? symlink(readlink($p), $stg_path)
+ : copy($p, $stg_path);
+
+printf $out "%-40s\t%-40s\t%-40s\t%-12d\t%-10d\t%s\n",
+ $current_hash, $idx_entry->{c_hash},
+ $idx_entry->{b_hash}, $wrk_entry->{mtime},
+ $wrk_entry->{size}, $p;
+```
+
+Urn doesn't use delta chains or a CAS. It stores one copy of a file (base).
+Revisions are recorded as patches relative to the base. When the patch outgrows
+the file, urn snapshots that file and tracks future modifications relative to
+the new base.
+
+Commits record the directory structure and patchset. Patches are stored in a
+tarball to minimize inode usage. Tarballs larger than 512 bytes are gzipped.
+Compressing anything less isn't worth the effort. Trees are deduplicated using
+the content hash. Multiple commits point to a single tree.
+
+Diff/show commands look up revision, find tree and patchset in the object
+store, and apply the patch. Single-patch model keeps checkout operations at
+O(1) and resists a single corrupt patch corrupting history.
+
+External tools used like sort, diff, patch, tar, gzip are in the base system.
+Pipes and streams are used to communicate with them to keep the memory
+footprint low. Most operations are executed in memory. MEM_LIMIT can be used to
+fallback to disk when working with large repositories:
+
+```
+use constant MEM_LIMIT => 64 * 1024 * 1024;
+use constant CHUNK_LEN => 8192;
+use constant IO_LAYER => ":raw:perlio(layer=" . CHUNK_LEN . ")";
+
+if (!$use_disk) {
+ @buf = sort @buf;
+ return sub {
+ my $line = shift @buf;
+ return unless $line;
+ chomp $line;
+ my ($p, $m, $s) = split(/\t/, $line);
+ return { path => $p, mtime => $m, size => $s };
+ };
+} else {
+ $flush->() if @buf;
+ close $tmp_fh;
+ open(my $sort_fh, "-|", "sort", "-t", "\t", "-k1,1", $tmp_path) or die $!;
+ return sub {
+ my $line = <$sort_fh>;
+ unless ($line) { close $sort_fh; return; }
+ chomp $line;
+ my ($p, $s, $m) = split(/\t/, $line);
+ return { path => $p, mtime => $m, size => $s };
+ };
+}
+```
+
+Index/tree processing is O(N)—two-finger walk. To keep IO transparent, they are
+streamed line-by-line, using carefully calibrated buffers, instead of mmap.
+
+## Benchmarks
+
+Performed on T490 (i7-10510U, OpenBSD 7.8) against git v2.51.0:
+
+### Impact of repository size
+
+Small repo:
+
+```
+=============================================================
+ BENCHMARK: 200 files @ 20 depth
+=============================================================
+
+ACTION: Status
+-------------------------------------------------------------
+METRIC | URN | GIT
+----------------+----------------------+---------------------
+Time | 0.10s | 0.00s
+Max RSS | 0.02 MB | 0.00 MB
+Page faults | Maj:0 / Min:0 | Maj:0 / Min:0
+Inodes | 6 | 27
+Repo size | 20 KB | 116 KB
+-------------------------------------------------------------
+
+ACTION: Add
+-------------------------------------------------------------
+METRIC | URN | GIT
+----------------+----------------------+---------------------
+Time | 0.16s | 0.17s
+Max RSS | 0.02 MB | 0.00 MB
+Page faults | Maj:0 / Min:0 | Maj:0 / Min:0
+Inodes | 225 | 360
+Repo size | 3700 KB | 3348 KB
+-------------------------------------------------------------
+
+ACTION: Commit
+-------------------------------------------------------------
+METRIC | URN | GIT
+----------------+----------------------+---------------------
+Time | 0.17s | 0.04s
+Max RSS | 0.02 MB | 0.01 MB
+Page faults | Maj:0 / Min:0 | Maj:0 / Min:0
+Inodes | 347 | 397
+Repo size | 4212 KB | 3496 KB
+-------------------------------------------------------------
+
+ACTION: Status(Clean)
+-------------------------------------------------------------
+METRIC | URN | GIT
+----------------+----------------------+---------------------
+Time | 0.10s | 0.01s
+Max RSS | 0.02 MB | 0.00 MB
+Page faults | Maj:0 / Min:0 | Maj:0 / Min:0
+Inodes | 347 | 397
+Repo size | 4212 KB | 3496 KB
+-------------------------------------------------------------
+```
+
+Larger repo:
+
+```
+=============================================================
+ BENCHMARK: 5000 files @ 50 depth
+=============================================================
+
+ACTION: Status
+-------------------------------------------------------------
+METRIC | URN | GIT
+----------------+----------------------+---------------------
+Time | 0.26s | 0.00s
+Max RSS | 0.02 MB | 0.00 MB
+Page faults | Maj:0 / Min:0 | Maj:0 / Min:0
+Inodes | 6 | 27
+Repo size | 20 KB | 116 KB
+-------------------------------------------------------------
+
+ACTION: Add
+-------------------------------------------------------------
+METRIC | URN | GIT
+----------------+----------------------+---------------------
+Time | 2.82s | 4.62s
+Max RSS | 0.02 MB | 0.01 MB
+Page faults | Maj:0 / Min:0 | Maj:0 / Min:0
+Inodes | 5055 | 5284
+Repo size | 89444 KB | 70360 KB
+-------------------------------------------------------------
+
+ACTION: Commit
+-------------------------------------------------------------
+METRIC | URN | GIT
+----------------+----------------------+---------------------
+Time | 1.18s | 0.93s
+Max RSS | 0.03 MB | 0.01 MB
+Page faults | Maj:0 / Min:0 | Maj:0 / Min:0
+Inodes | 5264 | 5342
+Repo size | 91620 KB | 70592 KB
+-------------------------------------------------------------
+
+ACTION: Status (Clean)
+-------------------------------------------------------------
+METRIC | URN | GIT
+----------------+----------------------+---------------------
+Time | 0.34s | 0.10s
+Max RSS | 0.02 MB | 0.01 MB
+Page faults | Maj:0 / Min:0 | Maj:0 / Min:0
+Inodes | 5264 | 5342
+Repo size | 91620 KB | 70592 KB
+-------------------------------------------------------------
+```
+
+### Impact of commits over time
+
+Each commit modifies 2% of files. Modifications simulate small patches: a few
+lines added, and a few lines deleted.
+
+```
+=============================================================
+ HISTORY BENCHMARK: 1000 files (100 commits)
+=============================================================
+
+SNAPSHOT: Commit #20
+-------------------------------------------------------------
+METRIC | URN | GIT
+----------------+----------------------+---------------------
+Time | 0.34s | 0.18s
+Max RSS | 0.02 MB | 0.01 MB
+Page Faults | Maj:0/Min:0 | Maj:0/Min:0
+Inodes | 1302 | 2122
+Repo Size | 19220 KB | 21944 KB
+-------------------------------------------------------------
+
+SNAPSHOT: Commit #40
+-------------------------------------------------------------
+METRIC | URN | GIT
+----------------+----------------------+---------------------
+Time | 0.41s | 0.11s
+Max RSS | 0.02 MB | 0.01 MB
+Page Faults | Maj:0/Min:0 | Maj:0/Min:0
+Inodes | 1342 | 2924
+Repo Size | 19380 KB | 28848 KB
+-------------------------------------------------------------
+
+SNAPSHOT: Commit #60
+-------------------------------------------------------------
+METRIC | URN | GIT
+----------------+----------------------+---------------------
+Time | 0.41s | 0.09s
+Max RSS | 0.02 MB | 0.01 MB
+Page Faults | Maj:0/Min:0 | Maj:0/Min:0
+Inodes | 1383 | 3719
+Repo Size | 19544 KB | 35796 KB
+-------------------------------------------------------------
+
+SNAPSHOT: Commit #80
+-------------------------------------------------------------
+METRIC | URN | GIT
+----------------+----------------------+---------------------
+Time | 0.42s | 0.11s
+Max RSS | 0.02 MB | 0.01 MB
+Page Faults | Maj:0/Min:0 | Maj:0/Min:0
+Inodes | 1424 | 4532
+Repo Size | 19708 KB | 42868 KB
+-------------------------------------------------------------
+
+SNAPSHOT: Commit #100
+-------------------------------------------------------------
+METRIC | URN | GIT
+----------------+----------------------+---------------------
+Time | 0.40s | 0.10s
+Max RSS | 0.02 MB | 0.01 MB
+Page Faults | Maj:0/Min:0 | Maj:0/Min:0
+Inodes | 1464 | 5341
+Repo Size | 19868 KB | 49840 KB
+-------------------------------------------------------------
+```
+
+Overall, git has the speed and memory advantage. Curiously though, on a cold
+start, urn's add + commit time beats git's, while git's aggressive zlib
+compression beats urn's disk usage.
+
+Performance profile over many commits tells a different story, however. Git's
+optimized C core eventually and consistently outperforms urn. Urn's disk usage,
+which is what it was optimized for, is more stable than git's. In 80 commits,
+git wrote 27 MB of data to the disk, while urn only wrote 0.6 MB. Git's inode
+use exploded from 2,122 to 5,341. Urn only went from 1,302 to 1,464.
+
+Verdict: viable.
+