diff options
Diffstat (limited to '_log')
| -rw-r--r-- | _log/arduino-due.md | 2 | ||||
| -rw-r--r-- | _log/site-search.md | 2 | ||||
| -rw-r--r-- | _log/vcs-1.md | 271 |
3 files changed, 273 insertions, 2 deletions
diff --git a/_log/arduino-due.md b/_log/arduino-due.md index 1881547..900098d 100644 --- a/_log/arduino-due.md +++ b/_log/arduino-due.md @@ -1,5 +1,5 @@ --- -title: ATSAM3X8E bare-metal notes +title: Bare-metal ATSAM3X8E date: 2024-09-16 layout: post --- diff --git a/_log/site-search.md b/_log/site-search.md index 20c66de..b0b1d32 100644 --- a/_log/site-search.md +++ b/_log/site-search.md @@ -1,5 +1,5 @@ --- -title: Suffix-array search for static sites +title: Search engine for static sites date: 2026-01-03 layout: post --- diff --git a/_log/vcs-1.md b/_log/vcs-1.md new file mode 100644 index 0000000..ac0cc97 --- /dev/null +++ b/_log/vcs-1.md @@ -0,0 +1,271 @@ +--- +title: 'Urn: Exploring a SSD-friendly VCS architecture' +date: 2026-04-20 +layout: post +--- + +Git takes up 1819 inodes and 59 MB to track a 36.59 MB repository pre-GC. GC +collected 1514 inodes. Packing only reclaimed 6 MB. Can we do better? + +PoC implements status, add, commit, log, show, and diff; Supports symlinks, +binary files. Architecture tries to balance speed, memory, and storage; +Prioritizes SSD longevity—sequential read/writes, reduced TBW/WA—to the extent +Perl and the OS let us. + +A sorted index tracks files. Staging an object copies it to staging area, adds +path, mtime, size, content hash (SHA-1) to index: + +``` +my $p = $wrk_entry->{path}; +my $current_hash = hash_file_content($p); +my $stg_path = File::Spec->catfile(TMP_DIR, $p); +make_path(dirname($stg_path)); + +(-l $p) + ? symlink(readlink($p), $stg_path) + : copy($p, $stg_path); + +printf $out "%-40s\t%-40s\t%-40s\t%-12d\t%-10d\t%s\n", + $current_hash, $idx_entry->{c_hash}, + $idx_entry->{b_hash}, $wrk_entry->{mtime}, + $wrk_entry->{size}, $p; +``` + +Urn doesn't use delta chains or a CAS. It stores one copy of a file (base). +Revisions are recorded as patches relative to the base. When the patch outgrows +the file, urn snapshots that file and tracks future modifications relative to +the new base. + +Commits record the directory structure and patchset. Patches are stored in a +tarball to minimize inode usage. Tarballs larger than 512 bytes are gzipped. +Compressing anything less isn't worth the effort. Trees are deduplicated using +the content hash. Multiple commits point to a single tree. + +Diff/show commands look up revision, find tree and patchset in the object +store, and apply the patch. Single-patch model keeps checkout operations at +O(1) and resists a single corrupt patch corrupting history. + +External tools used like sort, diff, patch, tar, gzip are in the base system. +Pipes and streams are used to communicate with them to keep the memory +footprint low. Most operations are executed in memory. MEM_LIMIT can be used to +fallback to disk when working with large repositories: + +``` +use constant MEM_LIMIT => 64 * 1024 * 1024; +use constant CHUNK_LEN => 8192; +use constant IO_LAYER => ":raw:perlio(layer=" . CHUNK_LEN . ")"; + +if (!$use_disk) { + @buf = sort @buf; + return sub { + my $line = shift @buf; + return unless $line; + chomp $line; + my ($p, $m, $s) = split(/\t/, $line); + return { path => $p, mtime => $m, size => $s }; + }; +} else { + $flush->() if @buf; + close $tmp_fh; + open(my $sort_fh, "-|", "sort", "-t", "\t", "-k1,1", $tmp_path) or die $!; + return sub { + my $line = <$sort_fh>; + unless ($line) { close $sort_fh; return; } + chomp $line; + my ($p, $s, $m) = split(/\t/, $line); + return { path => $p, mtime => $m, size => $s }; + }; +} +``` + +Index/tree processing is O(N)—two-finger walk. To keep IO transparent, they are +streamed line-by-line, using carefully calibrated buffers, instead of mmap. + +## Benchmarks + +Performed on T490 (i7-10510U, OpenBSD 7.8) against git v2.51.0: + +### Impact of repository size + +Small repo: + +``` +============================================================= + BENCHMARK: 200 files @ 20 depth +============================================================= + +ACTION: Status +------------------------------------------------------------- +METRIC | URN | GIT +----------------+----------------------+--------------------- +Time | 0.10s | 0.00s +Max RSS | 0.02 MB | 0.00 MB +Page faults | Maj:0 / Min:0 | Maj:0 / Min:0 +Inodes | 6 | 27 +Repo size | 20 KB | 116 KB +------------------------------------------------------------- + +ACTION: Add +------------------------------------------------------------- +METRIC | URN | GIT +----------------+----------------------+--------------------- +Time | 0.16s | 0.17s +Max RSS | 0.02 MB | 0.00 MB +Page faults | Maj:0 / Min:0 | Maj:0 / Min:0 +Inodes | 225 | 360 +Repo size | 3700 KB | 3348 KB +------------------------------------------------------------- + +ACTION: Commit +------------------------------------------------------------- +METRIC | URN | GIT +----------------+----------------------+--------------------- +Time | 0.17s | 0.04s +Max RSS | 0.02 MB | 0.01 MB +Page faults | Maj:0 / Min:0 | Maj:0 / Min:0 +Inodes | 347 | 397 +Repo size | 4212 KB | 3496 KB +------------------------------------------------------------- + +ACTION: Status(Clean) +------------------------------------------------------------- +METRIC | URN | GIT +----------------+----------------------+--------------------- +Time | 0.10s | 0.01s +Max RSS | 0.02 MB | 0.00 MB +Page faults | Maj:0 / Min:0 | Maj:0 / Min:0 +Inodes | 347 | 397 +Repo size | 4212 KB | 3496 KB +------------------------------------------------------------- +``` + +Larger repo: + +``` +============================================================= + BENCHMARK: 5000 files @ 50 depth +============================================================= + +ACTION: Status +------------------------------------------------------------- +METRIC | URN | GIT +----------------+----------------------+--------------------- +Time | 0.26s | 0.00s +Max RSS | 0.02 MB | 0.00 MB +Page faults | Maj:0 / Min:0 | Maj:0 / Min:0 +Inodes | 6 | 27 +Repo size | 20 KB | 116 KB +------------------------------------------------------------- + +ACTION: Add +------------------------------------------------------------- +METRIC | URN | GIT +----------------+----------------------+--------------------- +Time | 2.82s | 4.62s +Max RSS | 0.02 MB | 0.01 MB +Page faults | Maj:0 / Min:0 | Maj:0 / Min:0 +Inodes | 5055 | 5284 +Repo size | 89444 KB | 70360 KB +------------------------------------------------------------- + +ACTION: Commit +------------------------------------------------------------- +METRIC | URN | GIT +----------------+----------------------+--------------------- +Time | 1.18s | 0.93s +Max RSS | 0.03 MB | 0.01 MB +Page faults | Maj:0 / Min:0 | Maj:0 / Min:0 +Inodes | 5264 | 5342 +Repo size | 91620 KB | 70592 KB +------------------------------------------------------------- + +ACTION: Status (Clean) +------------------------------------------------------------- +METRIC | URN | GIT +----------------+----------------------+--------------------- +Time | 0.34s | 0.10s +Max RSS | 0.02 MB | 0.01 MB +Page faults | Maj:0 / Min:0 | Maj:0 / Min:0 +Inodes | 5264 | 5342 +Repo size | 91620 KB | 70592 KB +------------------------------------------------------------- +``` + +### Impact of commits over time + +Each commit modifies 2% of files. Modifications simulate small patches: a few +lines added, and a few lines deleted. + +``` +============================================================= + HISTORY BENCHMARK: 1000 files (100 commits) +============================================================= + +SNAPSHOT: Commit #20 +------------------------------------------------------------- +METRIC | URN | GIT +----------------+----------------------+--------------------- +Time | 0.34s | 0.18s +Max RSS | 0.02 MB | 0.01 MB +Page Faults | Maj:0/Min:0 | Maj:0/Min:0 +Inodes | 1302 | 2122 +Repo Size | 19220 KB | 21944 KB +------------------------------------------------------------- + +SNAPSHOT: Commit #40 +------------------------------------------------------------- +METRIC | URN | GIT +----------------+----------------------+--------------------- +Time | 0.41s | 0.11s +Max RSS | 0.02 MB | 0.01 MB +Page Faults | Maj:0/Min:0 | Maj:0/Min:0 +Inodes | 1342 | 2924 +Repo Size | 19380 KB | 28848 KB +------------------------------------------------------------- + +SNAPSHOT: Commit #60 +------------------------------------------------------------- +METRIC | URN | GIT +----------------+----------------------+--------------------- +Time | 0.41s | 0.09s +Max RSS | 0.02 MB | 0.01 MB +Page Faults | Maj:0/Min:0 | Maj:0/Min:0 +Inodes | 1383 | 3719 +Repo Size | 19544 KB | 35796 KB +------------------------------------------------------------- + +SNAPSHOT: Commit #80 +------------------------------------------------------------- +METRIC | URN | GIT +----------------+----------------------+--------------------- +Time | 0.42s | 0.11s +Max RSS | 0.02 MB | 0.01 MB +Page Faults | Maj:0/Min:0 | Maj:0/Min:0 +Inodes | 1424 | 4532 +Repo Size | 19708 KB | 42868 KB +------------------------------------------------------------- + +SNAPSHOT: Commit #100 +------------------------------------------------------------- +METRIC | URN | GIT +----------------+----------------------+--------------------- +Time | 0.40s | 0.10s +Max RSS | 0.02 MB | 0.01 MB +Page Faults | Maj:0/Min:0 | Maj:0/Min:0 +Inodes | 1464 | 5341 +Repo Size | 19868 KB | 49840 KB +------------------------------------------------------------- +``` + +Overall, git has the speed and memory advantage. Curiously though, on a cold +start, urn's add + commit time beats git's, while git's aggressive zlib +compression beats urn's disk usage. + +Performance profile over many commits tells a different story, however. Git's +optimized C core eventually and consistently outperforms urn. Urn's disk usage, +which is what it was optimized for, is more stable than git's. In 80 commits, +git wrote 27 MB of data to the disk, while urn only wrote 0.6 MB. Git's inode +use exploded from 2,122 to 5,341. Urn only went from 1,302 to 1,464. + +Verdict: viable. + |
