summaryrefslogtreecommitdiffstats
path: root/_site/log/search-with-cgi/index.html
diff options
context:
space:
mode:
Diffstat (limited to '_site/log/search-with-cgi/index.html')
-rw-r--r--_site/log/search-with-cgi/index.html103
1 files changed, 42 insertions, 61 deletions
diff --git a/_site/log/search-with-cgi/index.html b/_site/log/search-with-cgi/index.html
index 060a282..71b9f23 100644
--- a/_site/log/search-with-cgi/index.html
+++ b/_site/log/search-with-cgi/index.html
@@ -49,46 +49,41 @@
<h2 class="center" id="title">SITE SEARCH USING PERL + CGI</h2>
<h6 class="center">29 DECEMBER 2025</h5>
<br>
- <div class="twocol justify"><p>Need a way to search site–number of articles are growing.</p>
-
-<p>Searching site client-side using the RSS feed and JavaScript is not an option–
-bloats the feed and breaks the site for Lynx and other text browsers.</p>
-
-<p>Perl’s great for text processing–especially regex work. Few lines of Perl
-could do a regex search and send the result back via CGI. OpenBSD httpd speaks
-CGI, Perl and slowcgi are in the base systems. No dependencies. Works on every
-conceivable browser.</p>
-
-<p>Perl: traverse the directory with File::Find recursively. If search text is
-found grab the file name, title and up to 50 chars of the first paragraph to
-include in the search result.</p>
-
-<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>find(sub {
- if (open my $fh, '&lt;', $_) {
- my $content = do { local $/; &lt;$fh&gt; };
- close $fh;
-
- if ($content =~ /\Q$search_text\E/i) {
- my ($title) = $content =~ /&lt;title&gt;(.*?)&lt;\/title&gt;/is;
- $title ||= $File::Find::name;
- my ($p_content) = $content =~ /&lt;p[^&gt;]*&gt;(.*?)&lt;\/p&gt;/is;
- my $snippet = $p_content || "";
- $snippet =~ s/&lt;[^&gt;]*&gt;//g;
- $snippet =~ s/\s+/ /g;
- $snippet = substr($snippet, 0, 50);
- $snippet .= "..." if length($p_content || "") &gt; 50;
-
- push @results, {
- path =&gt; $File::Find::name,
- title =&gt; $title,
- snippet =&gt; $snippet
- };
- }
- }
+ <div class="twocol justify"><p>Number of articles on the site are growing. Need a way to search site.</p>
+
+<p>Searching the RSS feed client-side using JavaScript is not an option. That
+would make the feed much heavier and break the site for text-based web browsers
+like Lynx.</p>
+
+<p>Not gonna use an inverted index–More than an evening’s effort, especially if I
+want partial matching. I want partial matching.</p>
+
+<p>Few lines of Perl could do a regex search and send the result back via CGI.
+OpenBSD httpd speaks CGI. Perl and slowcgi are in the base system. No
+dependencies.</p>
+
+<p>Perl: traverse directory with File::Find. If search text is found grab the file
+name, title and up to 50 chars from the first paragraph to include in the
+search result.</p>
+
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>find({
+ wanted =&gt; sub {
+ return unless -f $_ &amp;&amp; $_ eq 'index.html';
+ # ... file reading ...
+ if ($content =~ /\Q$search_text\E/i) {
+ # Extract title, snippet
+ push @results, {
+ path =&gt; $File::Find::name,
+ title =&gt; $title,
+ snippet =&gt; $snippet
+ };
+ }
+ },
+ follow =&gt; 0,
}, $dir);
</code></pre></div></div>
-<p>Don’t need the Perl CGI module, httpd sets QUERY_STRING for the slowcgi script:</p>
+<p>httpd sets the search text in QUERY_STRING env. Don’t need Perl’s CGI module.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>my %params;
if ($ENV{QUERY_STRING}) {
@@ -101,38 +96,24 @@ if ($ENV{QUERY_STRING}) {
}
</code></pre></div></div>
-<p>Run the script as www user. Permissions: 554 (read + execute).</p>
+<p>Security.</p>
-<p>Running in OpenBSD chroot: Check Perl’s dynamic object dependencies:</p>
+<p>ReDOS, XSS, command injection, symlink attacks. Did I miss anything? Probably.</p>
-<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ ldd $(which perl)
-/usr/bin/perl:
- Start End Type Open Ref GrpRef Name
- 000008797e8e6000 000008797e8eb000 exe 1 0 0 /usr/bin/perl
- 0000087c1ffe5000 0000087c20396000 rlib 0 1 0 /usr/lib/libperl.so.26.0
- 0000087bf4508000 0000087bf4539000 rlib 0 2 0 /usr/lib/libm.so.10.1
- 0000087b9e801000 0000087b9e907000 rlib 0 2 0 /usr/lib/libc.so.102.0
- 0000087bba182000 0000087bba182000 ld.so 0 1 0 /usr/libexec/ld.so
-</code></pre></div></div>
+<p>ReDOS: sanitized user input, length-limit search text, quote metacharacters
+with <code class="language-plaintext highlighter-rouge">\Q$search_text\E</code>.</p>
-<p>Copy them over to chroot. Now should have /var/www/usr/bin/perl,
-/usr/lib/libperl.so.26.0, and so on.</p>
+<p>XSS: sanitized user input. Escaped HTML.</p>
-<p>Troubleshooting: look for issues in logs or try executing the script in chroot:</p>
+<p>Command injection: no exec()/system() calls. Non-privileged user (www).</p>
-<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ cat /var/log/messages | grep slowcgi
-# chroot /var/www/ htdocs/path/to/script/script.cgi
-</code></pre></div></div>
-<p>The last command exposes any missing Perl modules in chroot and where to find
-them. Copy them over as well.</p>
+<p>Symlink attacks: File::Find don’t follow symlinks (follow =&gt; 0). chroot.</p>
-<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>location "/cgi-bin/*" {
- fastcgi socket "/run/slowcgi.sock"
-}
-</code></pre></div></div>
+<p>Access controls: files (444), directories and CGI script: 554.</p>
-<p>in httpd.conf routes queries to slowcgi.</p>
+<p>Verdict: O(n) speed. Works on every conceivable browser. Good enough.</p>
+<p>Commit: <a href="https://git.asciimx.com/www/commit/?h=term&amp;id=9fec793abe0a73e5cd502a1d1e935e2413b85079">9fec793</a></p>
</div>
<p class="post-author right">by W. D. Sadeep Madurange</p>
</div>