summaryrefslogtreecommitdiffstats
path: root/_log/site-search.md
diff options
context:
space:
mode:
Diffstat (limited to '_log/site-search.md')
-rw-r--r--_log/site-search.md23
1 files changed, 13 insertions, 10 deletions
diff --git a/_log/site-search.md b/_log/site-search.md
index dab5133..4ccae3c 100644
--- a/_log/site-search.md
+++ b/_log/site-search.md
@@ -4,7 +4,7 @@ date: 2026-01-03
layout: post
---
-Article count is growing. Need a way to search.
+Article count is growing. Need search.
Requirements: matches substrings, case-insensitive, fast, secure. No
JavaScript.
@@ -14,7 +14,7 @@ Architecture: browser → httpd → slowcgi → Perl CGI script.
Perl, httpd, slowcgi are in the OpenBSD base system. Instead of secrets, file
system permissions govern access.
-2025-12-30: Regex search.
+2025-12-30: Regex.
140-line Perl script searches 500 files in 40ms. Fast enough; O(N) pull felt at
higher file counts.
@@ -22,7 +22,7 @@ higher file counts.
Introduces ReDoS and symlink attack vectors. Both can be mitigated. Tempted to
stop here.
-2026-01-03: Suffix Array (SA) based index lookup.
+2026-01-03: Suffix array (SA) based index lookup.
Slurping files on every request bothers me. Regex search depends almost
entirely on hardware for speed.
@@ -36,8 +36,8 @@ $ cd cgi-bin/
$ perl indexer.pl
```
-Indexer extracts HTML, lowercases, encodes into UTF-8 binary sequences. Null
-byte sentinel for document boundaries. sa.bin stores suffix offsets as
+Indexer extracts HTML, lowercases, and encodes into UTF-8 binary sequences.
+Null byte sentinel for document boundaries. sa.bin stores suffix offsets as
32-bit unsigned integers, sorted by lexicographical order:
```
@@ -120,9 +120,12 @@ Resource exhaustion and XSS attacks are inherent. Former mitigated by limiting
concurrent searches via lock-file semaphores. Query length (64B) and result set
(20) capped. All output is HTML-escaped to prevent XSS.
-Secure by default. Fast. Durable.
+Verdict: Fast. Durable. Secure by default.
+
+Commit: <a
+href="https://git.asciimx.com/www/commit/?h=term&id=6da102d6e0494a3eac3f05fa3b2cdcc25ba2754e"
+class="external" target="_blank" rel="noopener noreferrer">6da102d</a> |
+Benchmarks: <a
+href="https://git.asciimx.com/site-search-bm/commit/?id=8a4da6809cf9368cd6a5dd7351181ea4256453f9"
+class="external" target="_blank" rel="noopener noreferrer">8a4da68</a>
-Commit:
-[6da102d](https://git.asciimx.com/www/commit/?h=term&id=6da102d6e0494a3eac3f05fa3b2cdcc25ba2754e)
-| Benchmarks:
-[8a4da68](https://git.asciimx.com/site-search-bm/commit/?id=8a4da6809cf9368cd6a5dd7351181ea4256453f9)