diff options
Diffstat (limited to '_log/site-search.md')
| -rw-r--r-- | _log/site-search.md | 12 |
1 files changed, 6 insertions, 6 deletions
diff --git a/_log/site-search.md b/_log/site-search.md index 1752db7..4ccae3c 100644 --- a/_log/site-search.md +++ b/_log/site-search.md @@ -4,7 +4,7 @@ date: 2026-01-03 layout: post --- -Article count is growing. Need a way to search. +Article count is growing. Need search. Requirements: matches substrings, case-insensitive, fast, secure. No JavaScript. @@ -14,7 +14,7 @@ Architecture: browser → httpd → slowcgi → Perl CGI script. Perl, httpd, slowcgi are in the OpenBSD base system. Instead of secrets, file system permissions govern access. -2025-12-30: Regex search. +2025-12-30: Regex. 140-line Perl script searches 500 files in 40ms. Fast enough; O(N) pull felt at higher file counts. @@ -22,7 +22,7 @@ higher file counts. Introduces ReDoS and symlink attack vectors. Both can be mitigated. Tempted to stop here. -2026-01-03: Suffix Array (SA) based index lookup. +2026-01-03: Suffix array (SA) based index lookup. Slurping files on every request bothers me. Regex search depends almost entirely on hardware for speed. @@ -36,8 +36,8 @@ $ cd cgi-bin/ $ perl indexer.pl ``` -Indexer extracts HTML, lowercases, encodes into UTF-8 binary sequences. Null -byte sentinel for document boundaries. sa.bin stores suffix offsets as +Indexer extracts HTML, lowercases, and encodes into UTF-8 binary sequences. +Null byte sentinel for document boundaries. sa.bin stores suffix offsets as 32-bit unsigned integers, sorted by lexicographical order: ``` @@ -120,7 +120,7 @@ Resource exhaustion and XSS attacks are inherent. Former mitigated by limiting concurrent searches via lock-file semaphores. Query length (64B) and result set (20) capped. All output is HTML-escaped to prevent XSS. -Secure by default. Fast. Durable. +Verdict: Fast. Durable. Secure by default. Commit: <a href="https://git.asciimx.com/www/commit/?h=term&id=6da102d6e0494a3eac3f05fa3b2cdcc25ba2754e" |
