diff options
Diffstat (limited to '_log/site-search.md')
| -rw-r--r-- | _log/site-search.md | 23 |
1 files changed, 13 insertions, 10 deletions
diff --git a/_log/site-search.md b/_log/site-search.md index dab5133..4ccae3c 100644 --- a/_log/site-search.md +++ b/_log/site-search.md @@ -4,7 +4,7 @@ date: 2026-01-03 layout: post --- -Article count is growing. Need a way to search. +Article count is growing. Need search. Requirements: matches substrings, case-insensitive, fast, secure. No JavaScript. @@ -14,7 +14,7 @@ Architecture: browser → httpd → slowcgi → Perl CGI script. Perl, httpd, slowcgi are in the OpenBSD base system. Instead of secrets, file system permissions govern access. -2025-12-30: Regex search. +2025-12-30: Regex. 140-line Perl script searches 500 files in 40ms. Fast enough; O(N) pull felt at higher file counts. @@ -22,7 +22,7 @@ higher file counts. Introduces ReDoS and symlink attack vectors. Both can be mitigated. Tempted to stop here. -2026-01-03: Suffix Array (SA) based index lookup. +2026-01-03: Suffix array (SA) based index lookup. Slurping files on every request bothers me. Regex search depends almost entirely on hardware for speed. @@ -36,8 +36,8 @@ $ cd cgi-bin/ $ perl indexer.pl ``` -Indexer extracts HTML, lowercases, encodes into UTF-8 binary sequences. Null -byte sentinel for document boundaries. sa.bin stores suffix offsets as +Indexer extracts HTML, lowercases, and encodes into UTF-8 binary sequences. +Null byte sentinel for document boundaries. sa.bin stores suffix offsets as 32-bit unsigned integers, sorted by lexicographical order: ``` @@ -120,9 +120,12 @@ Resource exhaustion and XSS attacks are inherent. Former mitigated by limiting concurrent searches via lock-file semaphores. Query length (64B) and result set (20) capped. All output is HTML-escaped to prevent XSS. -Secure by default. Fast. Durable. +Verdict: Fast. Durable. Secure by default. + +Commit: <a +href="https://git.asciimx.com/www/commit/?h=term&id=6da102d6e0494a3eac3f05fa3b2cdcc25ba2754e" +class="external" target="_blank" rel="noopener noreferrer">6da102d</a> | +Benchmarks: <a +href="https://git.asciimx.com/site-search-bm/commit/?id=8a4da6809cf9368cd6a5dd7351181ea4256453f9" +class="external" target="_blank" rel="noopener noreferrer">8a4da68</a> -Commit: -[6da102d](https://git.asciimx.com/www/commit/?h=term&id=6da102d6e0494a3eac3f05fa3b2cdcc25ba2754e) -| Benchmarks: -[8a4da68](https://git.asciimx.com/site-search-bm/commit/?id=8a4da6809cf9368cd6a5dd7351181ea4256453f9) |
