summaryrefslogtreecommitdiffstats
path: root/_log/site-search.md
diff options
context:
space:
mode:
Diffstat (limited to '_log/site-search.md')
-rw-r--r--_log/site-search.md16
1 files changed, 8 insertions, 8 deletions
diff --git a/_log/site-search.md b/_log/site-search.md
index 4c3f376..e9137de 100644
--- a/_log/site-search.md
+++ b/_log/site-search.md
@@ -1,5 +1,5 @@
---
-title: Site search
+title: Suffix-array search for static sites
date: 2026-01-03
layout: post
---
@@ -13,10 +13,10 @@ text browsers.
Architecture: browser ↔ httpd ↔ slowcgi ↔ search engine.
Server-side regex is viable for a personal site. But an index has clear
-advantages. Not much harder to implement.
+advantages. Not that much harder to implement.
-Index: suffix array (SA) implemented in Perl. Three files: corpus.bin, sa.bin,
-file_map.dat. Built with site:
+Index: suffix array (SA) implemented in Perl. Index— corpus.bin, sa.bin,
+file_map.dat—built with site:
```
$ JEKYLL_ENV=production bundle exec jekyll build
@@ -44,10 +44,10 @@ my @sa = 0 .. (length($corpus) - 1);
32-bit offsets limit index size to 4GB—more than sufficient and, if necessary,
easily expanded.
-Sort is the real bottleneck. Time complexity: O(L⋅N log N). Fast path caps L at
-64 bytes (length of a typical cache line).
+Sort (O(L⋅N log N)) is the real bottleneck. Fast path caps L at 64 bytes
+(length of a typical cache line).
-Search: Textbook range query with twin binary searches. Fixed-width offsets
+Search: Textbook range query with two binary searches. Fixed-width offsets
enable random access to index:
```
@@ -93,7 +93,7 @@ Security: httpd, slowcgi, Perl in OpenBSD base system. File system permissions
govern access. Runs in chroot.
Resource exhaustion and XSS attacks are inherent. Lock-file semaphores limit
-concurrent searches; query length (64B) and result set (20) are capped. All
+concurrent searches. Query length (64B) and result set (20) are capped. All
output is HTML-escaped to prevent XSS.
Warranty: 10,000 / 12 → 833 years.