1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
|
---
title: 'Urn: Exploring a SSD-friendly VCS architecture'
date: 2026-04-20
layout: post
---
Git takes up 1819 inodes and 59 MB to track a 36.59 MB repository pre-GC. GC
collected 1514 inodes. Packing only reclaimed 6 MB. Can we do better?
PoC implements status, add, commit, log, show, and diff; Supports symlinks,
binary files. Architecture tries to balance speed, memory, and storage;
Prioritizes SSD longevity—sequential read/writes, reduced TBW/WA—to the extent
Perl and the OS let us.
A sorted index tracks files. Staging an object copies it to staging area, adds
path, mtime, size, content hash (SHA-1) to index:
```
my $p = $wrk_entry->{path};
my $current_hash = hash_file_content($p);
my $stg_path = File::Spec->catfile(TMP_DIR, $p);
make_path(dirname($stg_path));
(-l $p)
? symlink(readlink($p), $stg_path)
: copy($p, $stg_path);
printf $out "%-40s\t%-40s\t%-40s\t%-12d\t%-10d\t%s\n",
$current_hash, $idx_entry->{c_hash},
$idx_entry->{b_hash}, $wrk_entry->{mtime},
$wrk_entry->{size}, $p;
```
Urn doesn't use delta chains or a CAS. It stores one copy of a file (base).
Revisions are recorded as patches relative to the base. When the patch outgrows
the file, urn snapshots that file and tracks future modifications relative to
the new base.
Commits record the directory structure and patchset. Patches are stored in a
tarball to minimize inode usage. Tarballs larger than 512 bytes are gzipped.
Compressing anything less isn't worth the effort. Trees are deduplicated using
the content hash. Multiple commits point to a single tree.
Diff/show commands look up revision, find tree and patchset in the object
store, and apply the patch. Single-patch model keeps checkout operations at
O(1) and resists a single corrupt patch corrupting history.
External tools used like sort, diff, patch, tar, gzip are in the base system.
Pipes and streams are used to communicate with them to keep the memory
footprint low. Most operations are executed in memory. MEM_LIMIT can be used to
fallback to disk when working with large repositories:
```
use constant MEM_LIMIT => 64 * 1024 * 1024;
use constant CHUNK_LEN => 8192;
use constant IO_LAYER => ":raw:perlio(layer=" . CHUNK_LEN . ")";
if (!$use_disk) {
@buf = sort @buf;
return sub {
my $line = shift @buf;
return unless $line;
chomp $line;
my ($p, $m, $s) = split(/\t/, $line);
return { path => $p, mtime => $m, size => $s };
};
} else {
$flush->() if @buf;
close $tmp_fh;
open(my $sort_fh, "-|", "sort", "-t", "\t", "-k1,1", $tmp_path) or die $!;
return sub {
my $line = <$sort_fh>;
unless ($line) { close $sort_fh; return; }
chomp $line;
my ($p, $s, $m) = split(/\t/, $line);
return { path => $p, mtime => $m, size => $s };
};
}
```
Index/tree processing is O(N)—two-finger walk. To keep IO transparent, they are
streamed line-by-line, using carefully calibrated buffers, instead of mmap.
## Benchmarks
Performed on T490 (i7-10510U, OpenBSD 7.8) against git v2.51.0:
### Impact of repository size
Small repo:
```
=============================================================
BENCHMARK: 200 files @ 20 depth
=============================================================
ACTION: Status
-------------------------------------------------------------
METRIC | URN | GIT
----------------+----------------------+---------------------
Time | 0.10s | 0.00s
Max RSS | 0.02 MB | 0.00 MB
Page faults | Maj:0 / Min:0 | Maj:0 / Min:0
Inodes | 6 | 27
Repo size | 20 KB | 116 KB
-------------------------------------------------------------
ACTION: Add
-------------------------------------------------------------
METRIC | URN | GIT
----------------+----------------------+---------------------
Time | 0.16s | 0.17s
Max RSS | 0.02 MB | 0.00 MB
Page faults | Maj:0 / Min:0 | Maj:0 / Min:0
Inodes | 225 | 360
Repo size | 3700 KB | 3348 KB
-------------------------------------------------------------
ACTION: Commit
-------------------------------------------------------------
METRIC | URN | GIT
----------------+----------------------+---------------------
Time | 0.17s | 0.04s
Max RSS | 0.02 MB | 0.01 MB
Page faults | Maj:0 / Min:0 | Maj:0 / Min:0
Inodes | 347 | 397
Repo size | 4212 KB | 3496 KB
-------------------------------------------------------------
ACTION: Status(Clean)
-------------------------------------------------------------
METRIC | URN | GIT
----------------+----------------------+---------------------
Time | 0.10s | 0.01s
Max RSS | 0.02 MB | 0.00 MB
Page faults | Maj:0 / Min:0 | Maj:0 / Min:0
Inodes | 347 | 397
Repo size | 4212 KB | 3496 KB
-------------------------------------------------------------
```
Larger repo:
```
=============================================================
BENCHMARK: 5000 files @ 50 depth
=============================================================
ACTION: Status
-------------------------------------------------------------
METRIC | URN | GIT
----------------+----------------------+---------------------
Time | 0.26s | 0.00s
Max RSS | 0.02 MB | 0.00 MB
Page faults | Maj:0 / Min:0 | Maj:0 / Min:0
Inodes | 6 | 27
Repo size | 20 KB | 116 KB
-------------------------------------------------------------
ACTION: Add
-------------------------------------------------------------
METRIC | URN | GIT
----------------+----------------------+---------------------
Time | 2.82s | 4.62s
Max RSS | 0.02 MB | 0.01 MB
Page faults | Maj:0 / Min:0 | Maj:0 / Min:0
Inodes | 5055 | 5284
Repo size | 89444 KB | 70360 KB
-------------------------------------------------------------
ACTION: Commit
-------------------------------------------------------------
METRIC | URN | GIT
----------------+----------------------+---------------------
Time | 1.18s | 0.93s
Max RSS | 0.03 MB | 0.01 MB
Page faults | Maj:0 / Min:0 | Maj:0 / Min:0
Inodes | 5264 | 5342
Repo size | 91620 KB | 70592 KB
-------------------------------------------------------------
ACTION: Status (Clean)
-------------------------------------------------------------
METRIC | URN | GIT
----------------+----------------------+---------------------
Time | 0.34s | 0.10s
Max RSS | 0.02 MB | 0.01 MB
Page faults | Maj:0 / Min:0 | Maj:0 / Min:0
Inodes | 5264 | 5342
Repo size | 91620 KB | 70592 KB
-------------------------------------------------------------
```
### Impact of commits over time
Each commit modifies 2% of files. Modifications simulate small patches: a few
lines added, and a few lines deleted.
```
=============================================================
HISTORY BENCHMARK: 1000 files (100 commits)
=============================================================
SNAPSHOT: Commit #20
-------------------------------------------------------------
METRIC | URN | GIT
----------------+----------------------+---------------------
Time | 0.34s | 0.18s
Max RSS | 0.02 MB | 0.01 MB
Page Faults | Maj:0/Min:0 | Maj:0/Min:0
Inodes | 1302 | 2122
Repo Size | 19220 KB | 21944 KB
-------------------------------------------------------------
SNAPSHOT: Commit #40
-------------------------------------------------------------
METRIC | URN | GIT
----------------+----------------------+---------------------
Time | 0.41s | 0.11s
Max RSS | 0.02 MB | 0.01 MB
Page Faults | Maj:0/Min:0 | Maj:0/Min:0
Inodes | 1342 | 2924
Repo Size | 19380 KB | 28848 KB
-------------------------------------------------------------
SNAPSHOT: Commit #60
-------------------------------------------------------------
METRIC | URN | GIT
----------------+----------------------+---------------------
Time | 0.41s | 0.09s
Max RSS | 0.02 MB | 0.01 MB
Page Faults | Maj:0/Min:0 | Maj:0/Min:0
Inodes | 1383 | 3719
Repo Size | 19544 KB | 35796 KB
-------------------------------------------------------------
SNAPSHOT: Commit #80
-------------------------------------------------------------
METRIC | URN | GIT
----------------+----------------------+---------------------
Time | 0.42s | 0.11s
Max RSS | 0.02 MB | 0.01 MB
Page Faults | Maj:0/Min:0 | Maj:0/Min:0
Inodes | 1424 | 4532
Repo Size | 19708 KB | 42868 KB
-------------------------------------------------------------
SNAPSHOT: Commit #100
-------------------------------------------------------------
METRIC | URN | GIT
----------------+----------------------+---------------------
Time | 0.40s | 0.10s
Max RSS | 0.02 MB | 0.01 MB
Page Faults | Maj:0/Min:0 | Maj:0/Min:0
Inodes | 1464 | 5341
Repo Size | 19868 KB | 49840 KB
-------------------------------------------------------------
```
Overall, git has the speed and memory advantage. Curiously though, on a cold
start, urn's add + commit time beats git's, while git's aggressive zlib
compression beats urn's disk usage.
Performance profile over many commits tells a different story, however. Git's
optimized C core eventually and consistently outperforms urn. Urn's disk usage,
which is what it was optimized for, is more stable than git's. In 80 commits,
git wrote 27 MB of data to the disk, while urn only wrote 0.6 MB. Git's inode
use exploded from 2,122 to 5,341. Urn only went from 1,302 to 1,464.
Verdict: viable.
|