summaryrefslogtreecommitdiffstats
path: root/_log
diff options
context:
space:
mode:
authorSadeep Madurange <sadeep@asciimx.com>2025-12-30 22:36:53 +0800
committerSadeep Madurange <sadeep@asciimx.com>2025-12-30 22:36:53 +0800
commitb65dabd3a2c83404e6612ce881ba4ad32e0b1ccd (patch)
treed0f06062fe0178b4539e3103b558427037bb6afc /_log
parent9fec793abe0a73e5cd502a1d1e935e2413b85079 (diff)
downloadwww-b65dabd3a2c83404e6612ce881ba4ad32e0b1ccd.tar.gz
Readme.
Diffstat (limited to '_log')
-rw-r--r--_log/search-with-cgi.md98
1 files changed, 38 insertions, 60 deletions
diff --git a/_log/search-with-cgi.md b/_log/search-with-cgi.md
index 2578878..0109294 100644
--- a/_log/search-with-cgi.md
+++ b/_log/search-with-cgi.md
@@ -4,47 +4,42 @@ date: 2025-12-29
layout: post
---
-Need a way to search site--number of articles are growing.
+Number of articles on the site are growing. Need a way to search site.
-Searching site client-side using the RSS feed and JavaScript is not an option--
-bloats the feed and breaks the site for Lynx and other text browsers.
+Searching the RSS feed client-side using JavaScript is not an option. That
+would make the feed much heavier and break the site for text-based web browsers
+like Lynx.
-Perl's great for text processing--especially regex work. Few lines of Perl
-could do a regex search and send the result back via CGI. OpenBSD httpd speaks
-CGI, Perl and slowcgi are in the base systems. No dependencies. Works on every
-conceivable browser.
+Not gonna use an inverted index--More than an evening's effort, especially if I
+want partial matching. I want partial matching.
-Perl: traverse the directory with File::Find recursively. If search text is
-found grab the file name, title and up to 50 chars of the first paragraph to
-include in the search result.
+Few lines of Perl could do a regex search and send the result back via CGI.
+OpenBSD httpd speaks CGI. Perl and slowcgi are in the base system. No
+dependencies.
+
+Perl: traverse directory with File::Find. If search text is found grab the file
+name, title and up to 50 chars from the first paragraph to include in the
+search result.
```
-find(sub {
- if (open my $fh, '<', $_) {
- my $content = do { local $/; <$fh> };
- close $fh;
-
- if ($content =~ /\Q$search_text\E/i) {
- my ($title) = $content =~ /<title>(.*?)<\/title>/is;
- $title ||= $File::Find::name;
- my ($p_content) = $content =~ /<p[^>]*>(.*?)<\/p>/is;
- my $snippet = $p_content || "";
- $snippet =~ s/<[^>]*>//g;
- $snippet =~ s/\s+/ /g;
- $snippet = substr($snippet, 0, 50);
- $snippet .= "..." if length($p_content || "") > 50;
-
- push @results, {
- path => $File::Find::name,
- title => $title,
- snippet => $snippet
- };
- }
- }
+find({
+ wanted => sub {
+ return unless -f $_ && $_ eq 'index.html';
+ # ... file reading ...
+ if ($content =~ /\Q$search_text\E/i) {
+ # Extract title, snippet
+ push @results, {
+ path => $File::Find::name,
+ title => $title,
+ snippet => $snippet
+ };
+ }
+ },
+ follow => 0,
}, $dir);
```
-Don't need the Perl CGI module, httpd sets QUERY_STRING for the slowcgi script:
+httpd sets the search text in QUERY_STRING env. Don't need Perl's CGI module.
```
my %params;
@@ -58,38 +53,21 @@ if ($ENV{QUERY_STRING}) {
}
```
-Run the script as www user. Permissions: 554 (read + execute).
+Security.
-Running in OpenBSD chroot: Check Perl's dynamic object dependencies:
+ReDOS, XSS, command injection, symlink attacks. Did I miss anything? Probably.
-```
-$ ldd $(which perl)
-/usr/bin/perl:
- Start End Type Open Ref GrpRef Name
- 000008797e8e6000 000008797e8eb000 exe 1 0 0 /usr/bin/perl
- 0000087c1ffe5000 0000087c20396000 rlib 0 1 0 /usr/lib/libperl.so.26.0
- 0000087bf4508000 0000087bf4539000 rlib 0 2 0 /usr/lib/libm.so.10.1
- 0000087b9e801000 0000087b9e907000 rlib 0 2 0 /usr/lib/libc.so.102.0
- 0000087bba182000 0000087bba182000 ld.so 0 1 0 /usr/libexec/ld.so
-```
+ReDOS: sanitized user input, length-limit search text, quote metacharacters
+with `\Q$search_text\E`.
-Copy them over to chroot. Now should have /var/www/usr/bin/perl,
-/usr/lib/libperl.so.26.0, and so on.
+XSS: sanitized user input. Escaped HTML.
-Troubleshooting: look for issues in logs or try executing the script in chroot:
+Command injection: no exec()/system() calls. Non-privileged user (www).
-```
-$ cat /var/log/messages | grep slowcgi
-# chroot /var/www/ htdocs/path/to/script/script.cgi
-```
-The last command exposes any missing Perl modules in chroot and where to find
-them. Copy them over as well.
+Symlink attacks: File::Find don't follow symlinks (follow => 0). chroot.
-```
-location "/cgi-bin/*" {
- fastcgi socket "/run/slowcgi.sock"
-}
-```
+Access controls: files (444), directories and CGI script: 554.
-in httpd.conf routes queries to slowcgi.
+Verdict: O(n) speed. Works on every conceivable browser. Good enough.
+Commit: [9fec793](https://git.asciimx.com/www/commit/?h=term&id=9fec793abe0a73e5cd502a1d1e935e2413b85079)