diff options
Diffstat (limited to '_site/log/search-with-cgi/index.html')
| -rw-r--r-- | _site/log/search-with-cgi/index.html | 103 |
1 files changed, 42 insertions, 61 deletions
diff --git a/_site/log/search-with-cgi/index.html b/_site/log/search-with-cgi/index.html index 060a282..71b9f23 100644 --- a/_site/log/search-with-cgi/index.html +++ b/_site/log/search-with-cgi/index.html @@ -49,46 +49,41 @@ <h2 class="center" id="title">SITE SEARCH USING PERL + CGI</h2> <h6 class="center">29 DECEMBER 2025</h5> <br> - <div class="twocol justify"><p>Need a way to search site–number of articles are growing.</p> - -<p>Searching site client-side using the RSS feed and JavaScript is not an option– -bloats the feed and breaks the site for Lynx and other text browsers.</p> - -<p>Perl’s great for text processing–especially regex work. Few lines of Perl -could do a regex search and send the result back via CGI. OpenBSD httpd speaks -CGI, Perl and slowcgi are in the base systems. No dependencies. Works on every -conceivable browser.</p> - -<p>Perl: traverse the directory with File::Find recursively. If search text is -found grab the file name, title and up to 50 chars of the first paragraph to -include in the search result.</p> - -<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>find(sub { - if (open my $fh, '<', $_) { - my $content = do { local $/; <$fh> }; - close $fh; - - if ($content =~ /\Q$search_text\E/i) { - my ($title) = $content =~ /<title>(.*?)<\/title>/is; - $title ||= $File::Find::name; - my ($p_content) = $content =~ /<p[^>]*>(.*?)<\/p>/is; - my $snippet = $p_content || ""; - $snippet =~ s/<[^>]*>//g; - $snippet =~ s/\s+/ /g; - $snippet = substr($snippet, 0, 50); - $snippet .= "..." if length($p_content || "") > 50; - - push @results, { - path => $File::Find::name, - title => $title, - snippet => $snippet - }; - } - } + <div class="twocol justify"><p>Number of articles on the site are growing. Need a way to search site.</p> + +<p>Searching the RSS feed client-side using JavaScript is not an option. That +would make the feed much heavier and break the site for text-based web browsers +like Lynx.</p> + +<p>Not gonna use an inverted index–More than an evening’s effort, especially if I +want partial matching. I want partial matching.</p> + +<p>Few lines of Perl could do a regex search and send the result back via CGI. +OpenBSD httpd speaks CGI. Perl and slowcgi are in the base system. No +dependencies.</p> + +<p>Perl: traverse directory with File::Find. If search text is found grab the file +name, title and up to 50 chars from the first paragraph to include in the +search result.</p> + +<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>find({ + wanted => sub { + return unless -f $_ && $_ eq 'index.html'; + # ... file reading ... + if ($content =~ /\Q$search_text\E/i) { + # Extract title, snippet + push @results, { + path => $File::Find::name, + title => $title, + snippet => $snippet + }; + } + }, + follow => 0, }, $dir); </code></pre></div></div> -<p>Don’t need the Perl CGI module, httpd sets QUERY_STRING for the slowcgi script:</p> +<p>httpd sets the search text in QUERY_STRING env. Don’t need Perl’s CGI module.</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>my %params; if ($ENV{QUERY_STRING}) { @@ -101,38 +96,24 @@ if ($ENV{QUERY_STRING}) { } </code></pre></div></div> -<p>Run the script as www user. Permissions: 554 (read + execute).</p> +<p>Security.</p> -<p>Running in OpenBSD chroot: Check Perl’s dynamic object dependencies:</p> +<p>ReDOS, XSS, command injection, symlink attacks. Did I miss anything? Probably.</p> -<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ ldd $(which perl) -/usr/bin/perl: - Start End Type Open Ref GrpRef Name - 000008797e8e6000 000008797e8eb000 exe 1 0 0 /usr/bin/perl - 0000087c1ffe5000 0000087c20396000 rlib 0 1 0 /usr/lib/libperl.so.26.0 - 0000087bf4508000 0000087bf4539000 rlib 0 2 0 /usr/lib/libm.so.10.1 - 0000087b9e801000 0000087b9e907000 rlib 0 2 0 /usr/lib/libc.so.102.0 - 0000087bba182000 0000087bba182000 ld.so 0 1 0 /usr/libexec/ld.so -</code></pre></div></div> +<p>ReDOS: sanitized user input, length-limit search text, quote metacharacters +with <code class="language-plaintext highlighter-rouge">\Q$search_text\E</code>.</p> -<p>Copy them over to chroot. Now should have /var/www/usr/bin/perl, -/usr/lib/libperl.so.26.0, and so on.</p> +<p>XSS: sanitized user input. Escaped HTML.</p> -<p>Troubleshooting: look for issues in logs or try executing the script in chroot:</p> +<p>Command injection: no exec()/system() calls. Non-privileged user (www).</p> -<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ cat /var/log/messages | grep slowcgi -# chroot /var/www/ htdocs/path/to/script/script.cgi -</code></pre></div></div> -<p>The last command exposes any missing Perl modules in chroot and where to find -them. Copy them over as well.</p> +<p>Symlink attacks: File::Find don’t follow symlinks (follow => 0). chroot.</p> -<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>location "/cgi-bin/*" { - fastcgi socket "/run/slowcgi.sock" -} -</code></pre></div></div> +<p>Access controls: files (444), directories and CGI script: 554.</p> -<p>in httpd.conf routes queries to slowcgi.</p> +<p>Verdict: O(n) speed. Works on every conceivable browser. Good enough.</p> +<p>Commit: <a href="https://git.asciimx.com/www/commit/?h=term&id=9fec793abe0a73e5cd502a1d1e935e2413b85079">9fec793</a></p> </div> <p class="post-author right">by W. D. Sadeep Madurange</p> </div> |
