summaryrefslogtreecommitdiffstats
path: root/_log/vcs-1.md
blob: d11b34224e65b1b0e661223e65dee584c4e3b6dc (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
---
title: Built and benchmarked Urn against Git
date: 2026-05-01
layout: post
---

Implemented init, status, add, commit, log, show, and diff using Perl and
OpenBSD base-system tools. Didn't bother with collaborative workflows.

Initial design mirrored the work tree with symlinks. Using filesystem as a
database felt clever, but walking directories on every command was untenable.
Replaced the symlink architecture with a path-sorted index.

The index tracks path, mtime, size, and SHA-1 hashes of staged, committed, and
base files. Only entries whose mtime and size changed (or has the same mtime as
the index to mitigate races caused by mtime precision) are hashed.

Implemented directory scans as a two-finger walk with the index; linear index
access trades random-access speed for sequential IO and keeps the memory
footprint low. 

Commits save staged files, trees, and deltas to a content-addressable object
store. Deltas are bundled into tarballs to conserve inodes. Objects larger than
512 bytes are gzipped. The threshold was arbitrary. Did not tune further.

Deltas, computed using diff, target the original file. Subsequent versions are
reconstructed via a single patch—no chains. Diff output is bloated but
compresses well. Rebase threshold is set to 1.4, assuming a 30-40% compression
ratio. When the delta exceeds the threshold, the file becomes the new base. 

Commands run in memory, using text streams and pipes wherever possible. Left
MEM_LIMIT configurable to fall back to disk for large repositories:

```
my $flush = sub {
    if (!$use_disk) {
        ($tmp_fh, $tmp_path) = tempfile(UNLINK => 1);
        $tmp_fh->setvbuf(undef, POSIX::_IOFBF(), $chunk_size); 
        binmode $tmp_fh, ":raw"; 
        $use_disk = 1;
    }
    print $tmp_fh @buf;
};

push @buf, $line;
$buf_size += length($line);
$tot_size += length($line);

if ((!$use_disk && $tot_size > MEM_LIMIT) || 
    ($use_disk && $buf_size > $chunk_size)) {
    $flush->();
}
```

Benchmarked against Git v2.51.0 on a T490 (i7-10510U, OpenBSD 7.8). Measured
with `/usr/bin/time -l sh -c`. Max RSS excludes child processes:

<pre class="pre-no-style">
=============================================================
 COMMIT BENCHMARK: 1000 files (100 commits)
 CONDITIONS: Depth=2, Files Mod=0.5%, Line Mod=5%
 INITIAL REPO SIZE: 17332 KB
=============================================================

SNAPSHOT: Commit #20
-------------------------------------------------------------
METRIC          | URN                  | GIT                 
----------------+----------------------+---------------------
Time            |                0.29s |                0.03s
Max RSS         |              0.02 MB |              0.01 MB
Page faults     |        Maj:0 / Min:0 |        Maj:0 / Min:0
Inodes          |                 1300 |                 1425
Repo size       |              6836 KB |              8296 KB
-------------------------------------------------------------

SNAPSHOT: Commit #60
-------------------------------------------------------------
METRIC          | URN                  | GIT                 
----------------+----------------------+---------------------
Time            |                0.35s |                0.03s
Max RSS         |              0.02 MB |              0.01 MB
Page faults     |        Maj:0 / Min:0 |        Maj:0 / Min:0
Inodes          |                 1381 |                 1706
Repo size       |              7896 KB |             10236 KB
-------------------------------------------------------------

SNAPSHOT: Commit #100
-------------------------------------------------------------
METRIC          | URN                  | GIT                 
----------------+----------------------+---------------------
Time            |                0.35s |                0.03s
Max RSS         |              0.02 MB |              0.01 MB
Page faults     |        Maj:0 / Min:0 |        Maj:0 / Min:0
Inodes          |                 1462 |                 1987
Repo size       |              9020 KB |             12168 KB
-------------------------------------------------------------

AFTER GIT GC
-------------------------------------------------------------
Final Size      |              9020 KB |              3812 KB
Final Inodes    |                 1462 |                   41
-------------------------------------------------------------

TOTAL URN REBASES: 0
</pre>

Git is 10x faster.

On storage, Urn shows promise. Git wrote 12 MB to track a 17 MB repository; Urn
wrote 9 MB. Over 80 commits, Git's inode consumption grew by 562, while Urn's
crept from 1,300 to 1,462.

Then fell the GC hammer. Inodes: 41. Space recovered: 8.4 MB.

Precise impact on TBW and write amplification is unknown.

Commit: <a
href="https://git.asciimx.com/urn/commit/?id=ff98b5711ae91d5cafd75764be192c0be5e592cf"
class="external" target="_blank" rel="noopener noreferrer">ff98b57</a>