1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
|
---
title: Built and benchmarked Urn against Git
date: 2026-05-01
layout: post
---
Implemented init, status, add, commit, log, show, and diff commands. Depends
only on OpenBSD base system tools. Didn't bother with collaborative workflows.
Initial design mirrored the work tree using symlinks. Using filesystem as a
database felt clever, but walking directories on every command and the inode
churn were untenable. Replaced the symlink architecture with a path-sorted
index.
The index tracks path, mtime, size, and SHA-1 hashes of staged, committed, and
base files. Hashing is skipped when mtime and size are unchanged. If the file
and the index share the same timestamp, it's rehashed to catch sub-second
changes.
Implemented directory scans as a two-finger walk with the index. Linear index
access trades random-access speed for sequential IO and keeps memory footprint
low.
Commits save staged files, trees, and deltas to the content-addressable object
store. Bundled deltas into tarballs to conserve inodes. Gzipped objects larger
than 512 bytes. The threshold was arbitrary. Did not tune further.
Deltas, computed using diff, target the original file. Subsequent versions are
reconstructed via a single patch—no chains. When the delta exceeds the rebase
threshold, the file becomes the new base. Diff output is bloated but compresses
well, so rebase threshold is set to 1.4, assuming a 30-40% compression ratio.
Commands run in memory, using text streams and pipes wherever possible. Left
MEM_LIMIT configurable to fall back to disk for large repositories:
```
my $flush = sub {
if (!$use_disk) {
($tmp_fh, $tmp_path) = tempfile(UNLINK => 1);
$tmp_fh->setvbuf(undef, POSIX::_IOFBF(), $chunk_size);
binmode $tmp_fh, ":raw";
$use_disk = 1;
}
print $tmp_fh @buf;
};
push @buf, $line;
$buf_size += length($line);
$tot_size += length($line);
if ((!$use_disk && $tot_size > MEM_LIMIT) ||
($use_disk && $buf_size > $chunk_size)) {
$flush->();
}
```
Benchmarked against Git v2.51.0 on a T490 (i7-10510U, OpenBSD 7.8):
<pre class="pre-no-style">
=============================================================
COMMIT BENCHMARK: 1000 files (100 commits)
CONDITIONS: Depth=2, Files Mod=0.5%, Line Mod=5%
INITIAL REPO SIZE: 17332 KB
=============================================================
SNAPSHOT: Commit #20
-------------------------------------------------------------
METRIC | URN | GIT
----------------+----------------------+---------------------
Time | 0.29s | 0.03s
Max RSS | 0.02 MB | 0.01 MB
Page faults | Maj:0 / Min:0 | Maj:0 / Min:0
Inodes | 1300 | 1425
Repo size | 6836 KB | 8296 KB
-------------------------------------------------------------
SNAPSHOT: Commit #60
-------------------------------------------------------------
METRIC | URN | GIT
----------------+----------------------+---------------------
Time | 0.35s | 0.03s
Max RSS | 0.02 MB | 0.01 MB
Page faults | Maj:0 / Min:0 | Maj:0 / Min:0
Inodes | 1381 | 1706
Repo size | 7896 KB | 10236 KB
-------------------------------------------------------------
SNAPSHOT: Commit #100
-------------------------------------------------------------
METRIC | URN | GIT
----------------+----------------------+---------------------
Time | 0.35s | 0.03s
Max RSS | 0.02 MB | 0.01 MB
Page faults | Maj:0 / Min:0 | Maj:0 / Min:0
Inodes | 1462 | 1987
Repo size | 9020 KB | 12168 KB
-------------------------------------------------------------
AFTER GIT GC
-------------------------------------------------------------
Final Size | 9020 KB | 3812 KB
Final Inodes | 1462 | 41
-------------------------------------------------------------
TOTAL URN REBASES: 0
</pre>
Git wins on speed and memory.
On storage, Urn shows promise. Git wrote 12 MB to track a 17 MB repository; Urn
wrote 9 MB. Over 80 commits, Git's inode consumption grew by 562. Urn's crept
from 1,300 to 1,462.
Then fell the GC hammer. Inodes: 41. Space recovered: 8.4 MB.
Precise impact on TBW and write amplification remains unknown.
Commit: <a
href="https://git.asciimx.com/urn/commit/?id=79d9ec2bdef0a82172fa0aa56f12004bef206c04"
class="external" target="_blank" rel="noopener noreferrer">79d9ec2</a>
|