1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
|
---
title: Implemented an experimental SSD-friendly VCS
date: 2026-04-23
layout: post
---
Implemented init, status, add, commit, log, show, and diff. Tracks regular
files, symlinks. Didn't bother with collaborative workflows.
Moved away from the initial work tree mirroring with symlinks to an index to
minimize inode churn and opening directories on every status/add command.
Implemented path-sorted index to track staged, commit, and base SHA-1 hashes,
mtime, and size. Like git, mtime and size are used to skip entries that didn't
change. Excluded file permissions.
When a new file is committed, it’s saved in the object store as the base. When
the file changes, diff generates a patch against the base. If the patch is
larger than the file, the file becomes the new base.
Unix diff doesn't compute binary deltas. Rolled a basic binary diff that works
well enough except for small changes that shift bytes:
```
my $patch = pack("Q", $new_size);
while (1) {
my $read_new = sysread($f_new, my $buf_new, $blk_size);
my $read_old = sysread($f_old, my $buf_old, $blk_size);
last if !$read_new && !$read_old;
# If blocks differ, record the change
if (($buf_new // '') ne ($buf_old // '')) {
# Format: Offset (Q), Length (L), raw data
$patch .= pack("QL", $offset, length($buf_new)) . $buf_new;
}
$offset += $blk_size;
}
```
A commit is a tree (list of paths and their base hashes) and a patch set.
Patches are stored as tarballs. Gzipped tarballs larger than 512 bytes.
Objects in the store are content-addressable. To reconstruct a file, look up
the revision file, follow the base hash, and apply the patch.
Status and add commands scan the work tree, sort entries by path, and perform a
two-finger walk with the index. Linear index access trades random-access speed
for sequential IO.
Operations are performed in memory—often using text streams and pipes.
MEM_LIMIT can be used to fall back on the disk for large repositories:
```
my $flush = sub {
if (!$use_disk) {
($tmp_fh, $tmp_path) = tempfile(UNLINK => 1);
$tmp_fh->setvbuf(undef, POSIX::_IOFBF(), $chunk_size);
binmode $tmp_fh, ":raw";
$use_disk = 1;
}
print $tmp_fh @buf;
};
push @buf, $line;
$buf_size += length($line);
$tot_size += length($line);
if ((!$use_disk && $tot_size > MEM_LIMIT) ||
($use_disk && $buf_size > $chunk_size)) {
$flush->();
}
```
Performed benchmarks on T490 (i7-10510U, OpenBSD 7.8) against git v2.51.0:
<pre class="pre-no-style">
=============================================================
BENCHMARK: 200 files @ 20 depth
=============================================================
ACTION: Status
-------------------------------------------------------------
METRIC | URN | GIT
----------------+----------------------+---------------------
Time | 0.10s | 0.00s
Max RSS | 0.02 MB | 0.00 MB
Page faults | Maj:0 / Min:0 | Maj:0 / Min:0
Inodes | 6 | 27
Repo size | 20 KB | 116 KB
-------------------------------------------------------------
ACTION: Add
-------------------------------------------------------------
METRIC | URN | GIT
----------------+----------------------+---------------------
Time | 0.16s | 0.17s
Max RSS | 0.02 MB | 0.00 MB
Page faults | Maj:0 / Min:0 | Maj:0 / Min:0
Inodes | 225 | 360
Repo size | 3700 KB | 3348 KB
-------------------------------------------------------------
ACTION: Commit
-------------------------------------------------------------
METRIC | URN | GIT
----------------+----------------------+---------------------
Time | 0.17s | 0.04s
Max RSS | 0.02 MB | 0.01 MB
Page faults | Maj:0 / Min:0 | Maj:0 / Min:0
Inodes | 347 | 397
Repo size | 4212 KB | 3496 KB
-------------------------------------------------------------
ACTION: Status(Clean)
-------------------------------------------------------------
METRIC | URN | GIT
----------------+----------------------+---------------------
Time | 0.10s | 0.01s
Max RSS | 0.02 MB | 0.00 MB
Page faults | Maj:0 / Min:0 | Maj:0 / Min:0
Inodes | 347 | 397
Repo size | 4212 KB | 3496 KB
-------------------------------------------------------------
=============================================================
BENCHMARK: 5000 files @ 50 depth
=============================================================
ACTION: Status
-------------------------------------------------------------
METRIC | URN | GIT
----------------+----------------------+---------------------
Time | 0.26s | 0.00s
Max RSS | 0.02 MB | 0.00 MB
Page faults | Maj:0 / Min:0 | Maj:0 / Min:0
Inodes | 6 | 27
Repo size | 20 KB | 116 KB
-------------------------------------------------------------
ACTION: Add
-------------------------------------------------------------
METRIC | URN | GIT
----------------+----------------------+---------------------
Time | 2.82s | 4.62s
Max RSS | 0.02 MB | 0.01 MB
Page faults | Maj:0 / Min:0 | Maj:0 / Min:0
Inodes | 5055 | 5284
Repo size | 89444 KB | 70360 KB
-------------------------------------------------------------
ACTION: Commit
-------------------------------------------------------------
METRIC | URN | GIT
----------------+----------------------+---------------------
Time | 1.18s | 0.93s
Max RSS | 0.03 MB | 0.01 MB
Page faults | Maj:0 / Min:0 | Maj:0 / Min:0
Inodes | 5264 | 5342
Repo size | 91620 KB | 70592 KB
-------------------------------------------------------------
ACTION: Status (Clean)
-------------------------------------------------------------
METRIC | URN | GIT
----------------+----------------------+---------------------
Time | 0.34s | 0.10s
Max RSS | 0.02 MB | 0.01 MB
Page faults | Maj:0 / Min:0 | Maj:0 / Min:0
Inodes | 5264 | 5342
Repo size | 91620 KB | 70592 KB
-------------------------------------------------------------
=============================================================
REBASE BENCHMARK: 1000 files (100 commits)
CONDITIONS: Depth=2, Files Mod=5%, Change=50%
INITIAL RAW DATA SIZE: 16976 KB
=============================================================
SNAPSHOT: Commit #20
-------------------------------------------------------------
METRIC | URN | GIT
----------------+----------------------+---------------------
Time | 0.29s | 0.05s
Max RSS | 0.02 MB | 0.01 MB
Page faults | Maj:0 / Min:0 | Maj:0 / Min:0
Inodes | 1578 | 2334
Repo size | 20404 KB | 19380 KB
-------------------------------------------------------------
SNAPSHOT: Commit #40
-------------------------------------------------------------
METRIC | URN | GIT
----------------+----------------------+---------------------
Time | 0.54s | 0.05s
Max RSS | 0.02 MB | 0.01 MB
Page faults | Maj:0 / Min:0 | Maj:0 / Min:0
Inodes | 1607 | 3374
Repo size | 20520 KB | 23788 KB
-------------------------------------------------------------
SNAPSHOT: Commit #60
-------------------------------------------------------------
METRIC | URN | GIT
----------------+----------------------+---------------------
Time | 0.31s | 0.05s
Max RSS | 0.02 MB | 0.01 MB
Page faults | Maj:0 / Min:0 | Maj:0 / Min:0
Inodes | 1635 | 4414
Repo size | 20632 KB | 28196 KB
-------------------------------------------------------------
SNAPSHOT: Commit #80
-------------------------------------------------------------
METRIC | URN | GIT
----------------+----------------------+---------------------
Time | 0.29s | 0.05s
Max RSS | 0.02 MB | 0.01 MB
Page faults | Maj:0 / Min:0 | Maj:0 / Min:0
Inodes | 1664 | 5454
Repo size | 20748 KB | 32596 KB
-------------------------------------------------------------
SNAPSHOT: Commit #100
-------------------------------------------------------------
METRIC | URN | GIT
----------------+----------------------+---------------------
Time | 0.54s | 0.10s
Max RSS | 0.02 MB | 0.01 MB
Page faults | Maj:0 / Min:0 | Maj:0 / Min:0
Inodes | 1693 | 6495
Repo size | 20864 KB | 37008 KB
-------------------------------------------------------------
TOTAL URN REBASES: 273
</pre>
Git wins on speed and memory. On small repositories, Urn is competitive.
In high-revision workloads with modest per-file churn, Urn beats git on
storage. Over 80 commits, git wrote 17 MB to track a 17 MB repo. Urn only wrote
0.5 MB despite 273 rebases. Git's inode count went from 2,334 to 6,495. Urn's
went from 1,578 to 1,693.
Git GC reclaims inodes, but doesn't save much space. On a 36.6 MB repo, git
used up 1819 inodes and 59 MB pre-GC. After GC, inode count dropped to 1514,
but space only shrank by 6 MB.
Commit: <a
href="https://git.asciimx.com/urn/commit/?id=49ae7748e4a95afa1fd9d08f4886952dfc1deca4"
class="external" target="_blank" rel="noopener noreferrer">49ae774</a>
|