From b65dabd3a2c83404e6612ce881ba4ad32e0b1ccd Mon Sep 17 00:00:00 2001 From: Sadeep Madurange Date: Tue, 30 Dec 2025 22:36:53 +0800 Subject: Readme. --- _site/log/search-with-cgi/index.html | 103 ++++++++++++++--------------------- 1 file changed, 42 insertions(+), 61 deletions(-) (limited to '_site/log') diff --git a/_site/log/search-with-cgi/index.html b/_site/log/search-with-cgi/index.html index 060a282..71b9f23 100644 --- a/_site/log/search-with-cgi/index.html +++ b/_site/log/search-with-cgi/index.html @@ -49,46 +49,41 @@

SITE SEARCH USING PERL + CGI

29 DECEMBER 2025

-

Need a way to search site–number of articles are growing.

- -

Searching site client-side using the RSS feed and JavaScript is not an option– -bloats the feed and breaks the site for Lynx and other text browsers.

- -

Perl’s great for text processing–especially regex work. Few lines of Perl -could do a regex search and send the result back via CGI. OpenBSD httpd speaks -CGI, Perl and slowcgi are in the base systems. No dependencies. Works on every -conceivable browser.

- -

Perl: traverse the directory with File::Find recursively. If search text is -found grab the file name, title and up to 50 chars of the first paragraph to -include in the search result.

- -
find(sub {
-    if (open my $fh, '<', $_) {
-        my $content = do { local $/; <$fh> };
-        close $fh;
-            
-    if ($content =~ /\Q$search_text\E/i) {
-        my ($title) = $content =~ /<title>(.*?)<\/title>/is;
-        $title ||= $File::Find::name;
-        my ($p_content) = $content =~ /<p[^>]*>(.*?)<\/p>/is;
-        my $snippet = $p_content || "";
-        $snippet =~ s/<[^>]*>//g; 
-        $snippet =~ s/\s+/ /g;
-        $snippet = substr($snippet, 0, 50);
-        $snippet .= "..." if length($p_content || "") > 50;
-
-        push @results, { 
-            path    => $File::Find::name, 
-            title   => $title, 
-            snippet => $snippet 
-        };
-    }
-  }
+          

Number of articles on the site are growing. Need a way to search site.

+ +

Searching the RSS feed client-side using JavaScript is not an option. That +would make the feed much heavier and break the site for text-based web browsers +like Lynx.

+ +

Not gonna use an inverted index–More than an evening’s effort, especially if I +want partial matching. I want partial matching.

+ +

Few lines of Perl could do a regex search and send the result back via CGI. +OpenBSD httpd speaks CGI. Perl and slowcgi are in the base system. No +dependencies.

+ +

Perl: traverse directory with File::Find. If search text is found grab the file +name, title and up to 50 chars from the first paragraph to include in the +search result.

+ +
find({
+    wanted => sub {
+        return unless -f $_ && $_ eq 'index.html';
+        # ... file reading ...
+        if ($content =~ /\Q$search_text\E/i) {
+            # Extract title, snippet
+            push @results, { 
+                path    => $File::Find::name,
+                title   => $title, 
+                snippet => $snippet 
+            };
+        }
+    },
+    follow => 0,
 }, $dir);
 
-

Don’t need the Perl CGI module, httpd sets QUERY_STRING for the slowcgi script:

+

httpd sets the search text in QUERY_STRING env. Don’t need Perl’s CGI module.

my %params;
 if ($ENV{QUERY_STRING}) {
@@ -101,38 +96,24 @@ if ($ENV{QUERY_STRING}) {
 }
 
-

Run the script as www user. Permissions: 554 (read + execute).

+

Security.

-

Running in OpenBSD chroot: Check Perl’s dynamic object dependencies:

+

ReDOS, XSS, command injection, symlink attacks. Did I miss anything? Probably.

-
$ ldd $(which perl)
-/usr/bin/perl:
-        Start            End              Type  Open Ref GrpRef Name
-        000008797e8e6000 000008797e8eb000 exe   1    0   0      /usr/bin/perl
-        0000087c1ffe5000 0000087c20396000 rlib  0    1   0      /usr/lib/libperl.so.26.0
-        0000087bf4508000 0000087bf4539000 rlib  0    2   0      /usr/lib/libm.so.10.1
-        0000087b9e801000 0000087b9e907000 rlib  0    2   0      /usr/lib/libc.so.102.0
-        0000087bba182000 0000087bba182000 ld.so 0    1   0      /usr/libexec/ld.so
-
+

ReDOS: sanitized user input, length-limit search text, quote metacharacters +with \Q$search_text\E.

-

Copy them over to chroot. Now should have /var/www/usr/bin/perl, -/usr/lib/libperl.so.26.0, and so on.

+

XSS: sanitized user input. Escaped HTML.

-

Troubleshooting: look for issues in logs or try executing the script in chroot:

+

Command injection: no exec()/system() calls. Non-privileged user (www).

-
$ cat /var/log/messages | grep slowcgi
-# chroot /var/www/ htdocs/path/to/script/script.cgi
-
-

The last command exposes any missing Perl modules in chroot and where to find -them. Copy them over as well.

+

Symlink attacks: File::Find don’t follow symlinks (follow => 0). chroot.

-
location "/cgi-bin/*" {
-    fastcgi socket "/run/slowcgi.sock"
-}
-
+

Access controls: files (444), directories and CGI script: 554.

-

in httpd.conf routes queries to slowcgi.

+

Verdict: O(n) speed. Works on every conceivable browser. Good enough.

+

Commit: 9fec793

-- cgit v1.2.3