SITE SEARCH USING PERL + CGI
29 DECEMBER 2025
Number of articles on the site are growing. Need a way to search site.
Searching the RSS feed client-side using JavaScript is not an option. That
would make the feed much heavier and break the site for text-based web browsers
like Lynx.
Not gonna use an inverted index–More than an evening’s effort, especially if I
want partial matching. I want partial matching.
Few lines of Perl could do a regex search and send the result back via CGI.
OpenBSD httpd speaks CGI. Perl and slowcgi are in the base system. No
dependencies.
Perl: traverse directory with File::Find. If search text is found grab the file
name, title and up to 50 chars from the first paragraph to include in the
search result.
find({
wanted => sub {
return unless -f $_ && $_ eq 'index.html';
# ... file reading ...
if ($content =~ /\Q$search_text\E/i) {
# Extract title, snippet
push @results, {
path => $File::Find::name,
title => $title,
snippet => $snippet
};
}
},
follow => 0,
}, $dir);
httpd sets the search text in QUERY_STRING env. Don’t need Perl’s CGI module.
my %params;
if ($ENV{QUERY_STRING}) {
foreach my $pair (split /&/, $ENV{QUERY_STRING}) {
my ($key, $value) = split /=/, $pair;
$value =~ tr/+/ /;
$value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
$params{$key} = $value;
}
}
Security.
ReDOS, XSS, command injection, symlink attacks. Did I miss anything? Probably.
ReDOS: sanitized user input, length-limit search text, quote metacharacters
with \Q$search_text\E.
XSS: sanitized user input. Escaped HTML.
Command injection: no exec()/system() calls. Non-privileged user (www).
Symlink attacks: File::Find don’t follow symlinks (follow => 0). chroot.
Access controls: files (444), directories and CGI script: 554.
Verdict: O(n) speed. Works on every conceivable browser. Good enough.
Commit: 9fec793
Number of articles on the site are growing. Need a way to search site.
Searching the RSS feed client-side using JavaScript is not an option. That would make the feed much heavier and break the site for text-based web browsers like Lynx.
Not gonna use an inverted index–More than an evening’s effort, especially if I want partial matching. I want partial matching.
Few lines of Perl could do a regex search and send the result back via CGI. OpenBSD httpd speaks CGI. Perl and slowcgi are in the base system. No dependencies.
Perl: traverse directory with File::Find. If search text is found grab the file name, title and up to 50 chars from the first paragraph to include in the search result.
find({
wanted => sub {
return unless -f $_ && $_ eq 'index.html';
# ... file reading ...
if ($content =~ /\Q$search_text\E/i) {
# Extract title, snippet
push @results, {
path => $File::Find::name,
title => $title,
snippet => $snippet
};
}
},
follow => 0,
}, $dir);
httpd sets the search text in QUERY_STRING env. Don’t need Perl’s CGI module.
my %params;
if ($ENV{QUERY_STRING}) {
foreach my $pair (split /&/, $ENV{QUERY_STRING}) {
my ($key, $value) = split /=/, $pair;
$value =~ tr/+/ /;
$value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
$params{$key} = $value;
}
}
Security.
ReDOS, XSS, command injection, symlink attacks. Did I miss anything? Probably.
ReDOS: sanitized user input, length-limit search text, quote metacharacters
with \Q$search_text\E.
XSS: sanitized user input. Escaped HTML.
Command injection: no exec()/system() calls. Non-privileged user (www).
Symlink attacks: File::Find don’t follow symlinks (follow => 0). chroot.
Access controls: files (444), directories and CGI script: 554.
Verdict: O(n) speed. Works on every conceivable browser. Good enough.
Commit: 9fec793