diff options
| author | Sadeep Madurange <sadeep@asciimx.com> | 2026-01-10 15:34:34 +0800 |
|---|---|---|
| committer | Sadeep Madurange <sadeep@asciimx.com> | 2026-01-10 15:34:34 +0800 |
| commit | f0d65be8cef87084f65f373ddfe51ce5c8405879 (patch) | |
| tree | e453cd2ccd6320799fb93e9ccb21c761c60d0a05 /_log/site-search.md | |
| parent | a02d34ac1afa2eedcce5ce4eba5d3a6cdfdd8ec4 (diff) | |
| download | www-f0d65be8cef87084f65f373ddfe51ce5c8405879.tar.gz | |
IBM VGA fonts, major changes to typography, update site-search.
Diffstat (limited to '_log/site-search.md')
| -rw-r--r-- | _log/site-search.md | 12 |
1 files changed, 6 insertions, 6 deletions
diff --git a/_log/site-search.md b/_log/site-search.md index 1752db7..4ccae3c 100644 --- a/_log/site-search.md +++ b/_log/site-search.md @@ -4,7 +4,7 @@ date: 2026-01-03 layout: post --- -Article count is growing. Need a way to search. +Article count is growing. Need search. Requirements: matches substrings, case-insensitive, fast, secure. No JavaScript. @@ -14,7 +14,7 @@ Architecture: browser → httpd → slowcgi → Perl CGI script. Perl, httpd, slowcgi are in the OpenBSD base system. Instead of secrets, file system permissions govern access. -2025-12-30: Regex search. +2025-12-30: Regex. 140-line Perl script searches 500 files in 40ms. Fast enough; O(N) pull felt at higher file counts. @@ -22,7 +22,7 @@ higher file counts. Introduces ReDoS and symlink attack vectors. Both can be mitigated. Tempted to stop here. -2026-01-03: Suffix Array (SA) based index lookup. +2026-01-03: Suffix array (SA) based index lookup. Slurping files on every request bothers me. Regex search depends almost entirely on hardware for speed. @@ -36,8 +36,8 @@ $ cd cgi-bin/ $ perl indexer.pl ``` -Indexer extracts HTML, lowercases, encodes into UTF-8 binary sequences. Null -byte sentinel for document boundaries. sa.bin stores suffix offsets as +Indexer extracts HTML, lowercases, and encodes into UTF-8 binary sequences. +Null byte sentinel for document boundaries. sa.bin stores suffix offsets as 32-bit unsigned integers, sorted by lexicographical order: ``` @@ -120,7 +120,7 @@ Resource exhaustion and XSS attacks are inherent. Former mitigated by limiting concurrent searches via lock-file semaphores. Query length (64B) and result set (20) capped. All output is HTML-escaped to prevent XSS. -Secure by default. Fast. Durable. +Verdict: Fast. Durable. Secure by default. Commit: <a href="https://git.asciimx.com/www/commit/?h=term&id=6da102d6e0494a3eac3f05fa3b2cdcc25ba2754e" |
