diff options
Diffstat (limited to '_log/search-with-cgi.md')
| -rw-r--r-- | _log/search-with-cgi.md | 73 |
1 files changed, 0 insertions, 73 deletions
diff --git a/_log/search-with-cgi.md b/_log/search-with-cgi.md deleted file mode 100644 index 0109294..0000000 --- a/_log/search-with-cgi.md +++ /dev/null @@ -1,73 +0,0 @@ ---- -title: Site search using Perl + CGI -date: 2025-12-29 -layout: post ---- - -Number of articles on the site are growing. Need a way to search site. - -Searching the RSS feed client-side using JavaScript is not an option. That -would make the feed much heavier and break the site for text-based web browsers -like Lynx. - -Not gonna use an inverted index--More than an evening's effort, especially if I -want partial matching. I want partial matching. - -Few lines of Perl could do a regex search and send the result back via CGI. -OpenBSD httpd speaks CGI. Perl and slowcgi are in the base system. No -dependencies. - -Perl: traverse directory with File::Find. If search text is found grab the file -name, title and up to 50 chars from the first paragraph to include in the -search result. - -``` -find({ - wanted => sub { - return unless -f $_ && $_ eq 'index.html'; - # ... file reading ... - if ($content =~ /\Q$search_text\E/i) { - # Extract title, snippet - push @results, { - path => $File::Find::name, - title => $title, - snippet => $snippet - }; - } - }, - follow => 0, -}, $dir); -``` - -httpd sets the search text in QUERY_STRING env. Don't need Perl's CGI module. - -``` -my %params; -if ($ENV{QUERY_STRING}) { - foreach my $pair (split /&/, $ENV{QUERY_STRING}) { - my ($key, $value) = split /=/, $pair; - $value =~ tr/+/ /; - $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg; - $params{$key} = $value; - } -} -``` - -Security. - -ReDOS, XSS, command injection, symlink attacks. Did I miss anything? Probably. - -ReDOS: sanitized user input, length-limit search text, quote metacharacters -with `\Q$search_text\E`. - -XSS: sanitized user input. Escaped HTML. - -Command injection: no exec()/system() calls. Non-privileged user (www). - -Symlink attacks: File::Find don't follow symlinks (follow => 0). chroot. - -Access controls: files (444), directories and CGI script: 554. - -Verdict: O(n) speed. Works on every conceivable browser. Good enough. - -Commit: [9fec793](https://git.asciimx.com/www/commit/?h=term&id=9fec793abe0a73e5cd502a1d1e935e2413b85079) |
