summaryrefslogtreecommitdiffstats
path: root/_log/vcs-1.md
blob: 3b60ff77b651b992ddb6872342b106b9fa57627e (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
---
title: Built an experimental SSD-friendly VCS
date: 2026-04-23
layout: post
---

Implemented init, status, add, commit, log, show, and diff. Tracks regular
files, symlinks. Didn't bother with collaborative workflows.

Moved away from the initial work tree mirroring with symlinks to a path-sorted
index to minimize inode churn and opening directories on every status/add
command.

Implemented the index to track staged, commit, and base SHA-1 hashes, mtime,
and size. Like git, used mtime and size to skip entries that didn't change.
Excluded file permissions.

Designed work/commit tree scans around a two-finger walk with the index. Linear
index access trades random-access speed for sequential IO; keeps memory
footprint low. 

Operations run in memory, using text streams and pipes wherever possible. Left
MEM_LIMIT configurable to fall back to disk for large repositories:

```
my $flush = sub {
    if (!$use_disk) {
        ($tmp_fh, $tmp_path) = tempfile(UNLINK => 1);
        $tmp_fh->setvbuf(undef, POSIX::_IOFBF(), $chunk_size); 
        binmode $tmp_fh, ":raw"; 
        $use_disk = 1;
    }
    print $tmp_fh @buf;
};

push @buf, $line;
$buf_size += length($line);
$tot_size += length($line);

if ((!$use_disk && $tot_size > MEM_LIMIT) || 
    ($use_disk && $buf_size > $chunk_size)) {
    $flush->();
}
```

Implemented the commit command to atomically save (rename) staged files, the
tree, and the deltas to the object store. Bundled deltas into tarballs to
conserve inodes, and gzipped tarballs larger than 512 bytes. Objects in the
store are content-addressable.

Computed deltas against the first version of a file (base) to simplify
reconstruction via application of a single patch instead of delta chains. When
the delta outgrows the file, the file becomes the new base. 

Unix diff doesn't compute binary deltas. Rolled a basic binary diff that works
well enough except for small changes that shift bytes:

```
my $patch = pack("Q", $new_size); 
while (1) {
    my $read_new = sysread($f_new, my $buf_new, $blk_size);
    my $read_old = sysread($f_old, my $buf_old, $blk_size);
    last if !$read_new && !$read_old;

    # If blocks differ, record the change
    if (($buf_new // '') ne ($buf_old // '')) {
        # Format: Offset (Q), Length (L), raw data
        $patch .= pack("QL", $offset, length($buf_new)) . $buf_new;
    }
    $offset += $blk_size; 
}
```

Benchmarks on T490 (i7-10510U, OpenBSD 7.8) against git v2.51.0:

<pre class="pre-no-style">
=============================================================
 REBASE BENCHMARK: 1000 files (100 commits)
 CONDITIONS: Depth=2, Files Mod=5%, Change=50%
 INITIAL RAW DATA SIZE: 16976 KB
=============================================================

SNAPSHOT: Commit #20
-------------------------------------------------------------
METRIC          | URN                  | GIT                 
----------------+----------------------+---------------------
Time            |                0.29s |                0.05s
Max RSS         |              0.02 MB |              0.01 MB
Page faults     |        Maj:0 / Min:0 |        Maj:0 / Min:0
Inodes          |                 1578 |                 2334
Repo size       |             20404 KB |             19380 KB
-------------------------------------------------------------

SNAPSHOT: Commit #40
-------------------------------------------------------------
METRIC          | URN                  | GIT                 
----------------+----------------------+---------------------
Time            |                0.54s |                0.05s
Max RSS         |              0.02 MB |              0.01 MB
Page faults     |        Maj:0 / Min:0 |        Maj:0 / Min:0
Inodes          |                 1607 |                 3374
Repo size       |             20520 KB |             23788 KB
-------------------------------------------------------------

SNAPSHOT: Commit #60
-------------------------------------------------------------
METRIC          | URN                  | GIT                 
----------------+----------------------+---------------------
Time            |                0.31s |                0.05s
Max RSS         |              0.02 MB |              0.01 MB
Page faults     |        Maj:0 / Min:0 |        Maj:0 / Min:0
Inodes          |                 1635 |                 4414
Repo size       |             20632 KB |             28196 KB
-------------------------------------------------------------

SNAPSHOT: Commit #80
-------------------------------------------------------------
METRIC          | URN                  | GIT                 
----------------+----------------------+---------------------
Time            |                0.29s |                0.05s
Max RSS         |              0.02 MB |              0.01 MB
Page faults     |        Maj:0 / Min:0 |        Maj:0 / Min:0
Inodes          |                 1664 |                 5454
Repo size       |             20748 KB |             32596 KB
-------------------------------------------------------------

SNAPSHOT: Commit #100
-------------------------------------------------------------
METRIC          | URN                  | GIT                 
----------------+----------------------+---------------------
Time            |                0.54s |                0.10s
Max RSS         |              0.02 MB |              0.01 MB
Page faults     |        Maj:0 / Min:0 |        Maj:0 / Min:0
Inodes          |                 1693 |                 6495
Repo size       |             20864 KB |             37008 KB
-------------------------------------------------------------

TOTAL URN REBASES: 273
</pre>

Git wins on speed and memory. On small repositories, Urn is competitive.

In high-revision workloads with modest per-file churn, Urn beats git on
storage. Over 80 commits, git wrote 17 MB to track a 17 MB repo. Urn only wrote
0.5 MB despite 273 rebases. Git's inode count went from 2,334 to 6,495. Urn's
went from 1,578 to 1,693.

Git GC reclaims inodes, but doesn't save much space. On a 36.6 MB repo, git
used up 1819 inodes and 59 MB pre-GC. After GC, the inode count dropped to
1514, but the size only shrank by 6 MB.

Commit: <a
href="https://git.asciimx.com/urn/commit/?id=49ae7748e4a95afa1fd9d08f4886952dfc1deca4"
class="external" target="_blank" rel="noopener noreferrer">49ae774</a>