SITE SEARCH USING PERL + CGI

29 DECEMBER 2025

Number of articles on the site are growing. Need a way to search site.

Searching the RSS feed client-side using JavaScript is not an option. That would make the feed much heavier and break the site for text-based web browsers like Lynx.

Not gonna use an inverted index–More than an evening’s effort, especially if I want partial matching. I want partial matching.

Few lines of Perl could do a regex search and send the result back via CGI. OpenBSD httpd speaks CGI. Perl and slowcgi are in the base system. No dependencies.

Perl: traverse directory with File::Find. If search text is found grab the file name, title and up to 50 chars from the first paragraph to include in the search result.

find({
    wanted => sub {
        return unless -f $_ && $_ eq 'index.html';
        # ... file reading ...
        if ($content =~ /\Q$search_text\E/i) {
            # Extract title, snippet
            push @results, { 
                path    => $File::Find::name,
                title   => $title, 
                snippet => $snippet 
            };
        }
    },
    follow => 0,
}, $dir);

httpd sets the search text in QUERY_STRING env. Don’t need Perl’s CGI module.

my %params;
if ($ENV{QUERY_STRING}) {
    foreach my $pair (split /&/, $ENV{QUERY_STRING}) {
        my ($key, $value) = split /=/, $pair;
        $value =~ tr/+/ /;
        $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
        $params{$key} = $value;
    }
}

Security.

ReDOS, XSS, command injection, symlink attacks. Did I miss anything? Probably.

ReDOS: sanitized user input, length-limit search text, quote metacharacters with \Q$search_text\E.

XSS: sanitized user input. Escaped HTML.

Command injection: no exec()/system() calls. Non-privileged user (www).

Symlink attacks: File::Find don’t follow symlinks (follow => 0). chroot.

Access controls: files (444), directories and CGI script: 554.

Verdict: O(n) speed. Works on every conceivable browser. Good enough.

Commit: 9fec793