--- title: Site search using Perl + CGI date: 2025-12-29 layout: post --- Number of articles on the site are growing. Need a way to search site. Searching the RSS feed client-side using JavaScript is not an option. That would make the feed much heavier and break the site for text-based web browsers like Lynx. Not gonna use an inverted index--More than an evening's effort, especially if I want partial matching. I want partial matching. Few lines of Perl could do a regex search and send the result back via CGI. OpenBSD httpd speaks CGI. Perl and slowcgi are in the base system. No dependencies. Perl: traverse directory with File::Find. If search text is found grab the file name, title and up to 50 chars from the first paragraph to include in the search result. ``` find({ wanted => sub { return unless -f $_ && $_ eq 'index.html'; # ... file reading ... if ($content =~ /\Q$search_text\E/i) { # Extract title, snippet push @results, { path => $File::Find::name, title => $title, snippet => $snippet }; } }, follow => 0, }, $dir); ``` httpd sets the search text in QUERY_STRING env. Don't need Perl's CGI module. ``` my %params; if ($ENV{QUERY_STRING}) { foreach my $pair (split /&/, $ENV{QUERY_STRING}) { my ($key, $value) = split /=/, $pair; $value =~ tr/+/ /; $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg; $params{$key} = $value; } } ``` Security. ReDOS, XSS, command injection, symlink attacks. Did I miss anything? Probably. ReDOS: sanitized user input, length-limit search text, quote metacharacters with `\Q$search_text\E`. XSS: sanitized user input. Escaped HTML. Command injection: no exec()/system() calls. Non-privileged user (www). Symlink attacks: File::Find don't follow symlinks (follow => 0). chroot. Access controls: files (444), directories and CGI script: 554. Verdict: O(n) speed. Works on every conceivable browser. Good enough. Commit: [9fec793](https://git.asciimx.com/www/commit/?h=term&id=9fec793abe0a73e5cd502a1d1e935e2413b85079)