diff options
Diffstat (limited to '_log')
| -rw-r--r-- | _log/site-search.md | 25 |
1 files changed, 14 insertions, 11 deletions
diff --git a/_log/site-search.md b/_log/site-search.md index 0ff97fc..df5a7ab 100644 --- a/_log/site-search.md +++ b/_log/site-search.md @@ -9,9 +9,9 @@ Needed search for site. Requirements: matches substrings, case-insensitive, fast, secure. No JavaScript. -Architecture: browser → httpd → slowcgi → Perl CGI script. +Architecture: browser → httpd → slowcgi → Perl script. -SA index implemented with three files: corpus.bin, sa.bin, file_map.dat. Index +Implemented SA index with three files: corpus.bin, sa.bin, file_map.dat. Index built with site: ``` @@ -40,8 +40,10 @@ my @sa = 0 .. (length($corpus) - 1); Sort is the bottleneck. Time complexity: O(L⋅N log N). Fast path caps L at 64 bytes (length of a cache line) → O(N log N). -Search: Textbook range query with twin binary searches. Uses fixed-width -offsets for random access: +32-bit offsets limits index size to 4GB (243k articles). + +Search: Textbook range query with twin binary searches. Fixed-width offsets +enable random access: ``` seek($fh_sa, $mid * 4, 0); @@ -80,17 +82,18 @@ regex search: - Peak RAM (SA): 12504 KB - Peak RAM (Regex): 12804 KB - Search (SA): 0.0161 s - - Search (Regex): 0.9120 s + - Search (Regex): 0.9120 S -Security: httpd, slowcgi, Perl in base system--no dependencies. File system -permissions govern access. Runs in chroot. +Security: httpd, slowcgi, Perl are in the base system--no dependencies. File +system permissions govern access. Runs in chroot. Resource exhaustion and XSS attacks are inherent. Lock-file semaphores limit -concurrent searches. Query length (64B) and result set (20) capped. All output -is HTML-escaped to prevent XSS. +concurrent searches. Query length (64B) and result set (20) are capped. All +output is HTML-escaped to prevent XSS. + +Warranty: 10,000 / 12 → 833 years. -Warranty: 10,000 / 12 → 833 years. Next release: inverted index, year of our -Lord 2859. +Next release: inverted index; Anno Domini 2859. Commit: <a href="https://git.asciimx.com/www/commit/?h=term&id=6da102d6e0494a3eac3f05fa3b2cdcc25ba2754e" |
