PERL + FASTCGI + SA SEARCH ENGINE
02 JANUARY 2026
Number of articles growing. Need search.
Requirements: substring match, case-insensitive, fast, secure. No JavaScript.
Architecture: OpenBSD httpd → slowcgi (FastCGI) → Perl script.
Data structure: suffix array. Three files: corpus.bin (articles), sa.bin (sorted byte offsets), file_map.dat (metadata).
Indexer crawls posts, extracts HTML with regex, lowercases, concatenates. Null byte sentinel for document boundaries. Sort lexicographically::
# Use a block that forces byte-level comparison
{
use bytes;
@sa = sort {
# First 64 bytes check (fast path)
(substr($corpus, $a, 64) cmp substr($corpus, $b, 64)) ||
# Full string fallback (required for correctness)
(substr($corpus, $a) cmp substr($corpus, $b))
} @sa;
}
Slow path: O(L⋅N log N). Fast path caps L at 64 bytes → O(N log N). 64-byte length targets cache lines.
Search: binary search for range query. Cap at 20 results–define limits or be surprised by them.
File IO and memory: many seek/read small chunks beat one large allocation (see benchmarks for find_one_file.cgi).
Benchmarks on T490 (i7-10510U, OpenBSD 7.8, 16KB articles):
1,000 files: 0.31s indexing, 410 KB total index.
10,000 files: 10.97s indexing, 4.16 MB total index.
Search ‘arduino’ (0 matches):
1,000 files: 0.002s (SA) vs 0.080s (naive regex).
10,000 files: 0.016s (SA) vs 0.912s (naive regex).
Security. Semaphore (lock files) limits parallel queries. Escape HTML (XSS). Sanitize input–strip non-printables, limit length, and quote metacharacters (ReDOS). No exec/system (command injection). Chroot.
Verdict: Fast SA lookup. Primary attack vectors mitigated. No dependencies.