1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
|
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>Site search using Perl + CGI</title>
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Site search using Perl + CGI</title>
<link rel="stylesheet" href="/assets/css/main.css">
<link rel="stylesheet" href="/assets/css/skeleton.css">
</head>
</head>
<body>
<div id="nav-container" class="container">
<ul id="navlist" class="left">
<li >
<a href="/" class="link-decor-none">hme</a>
</li>
<li class="active">
<a href="/log/" class="link-decor-none">log</a>
</li>
<li >
<a href="/projects/" class="link-decor-none">poc</a>
</li>
<li >
<a href="/about/" class="link-decor-none">abt</a>
</li>
<li>
<a href="/cgi-bin/find.cgi" class="link-decor-none">sws</a>
</li>
<li>
<a href="/feed.xml" class="link-decor-none">rss</a>
</li>
</ul>
</div>
<main>
<div class="container">
<div class="container-2">
<h2 class="center" id="title">SITE SEARCH USING PERL + CGI</h2>
<h6 class="center">29 DECEMBER 2025</h5>
<br>
<div class="twocol justify"><p>Number of articles on the site are growing. Need a way to search site.</p>
<p>Searching the RSS feed client-side using JavaScript is not an option. That
would make the feed much heavier and break the site for text-based web browsers
like Lynx.</p>
<p>Not gonna use an inverted index–More than an evening’s effort, especially if I
want partial matching. I want partial matching.</p>
<p>Few lines of Perl could do a regex search and send the result back via CGI.
OpenBSD httpd speaks CGI. Perl and slowcgi are in the base system. No
dependencies.</p>
<p>Perl: traverse directory with File::Find. If search text is found grab the file
name, title and up to 50 chars from the first paragraph to include in the
search result.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>find({
wanted => sub {
return unless -f $_ && $_ eq 'index.html';
# ... file reading ...
if ($content =~ /\Q$search_text\E/i) {
# Extract title, snippet
push @results, {
path => $File::Find::name,
title => $title,
snippet => $snippet
};
}
},
follow => 0,
}, $dir);
</code></pre></div></div>
<p>httpd sets the search text in QUERY_STRING env. Don’t need Perl’s CGI module.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>my %params;
if ($ENV{QUERY_STRING}) {
foreach my $pair (split /&/, $ENV{QUERY_STRING}) {
my ($key, $value) = split /=/, $pair;
$value =~ tr/+/ /;
$value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
$params{$key} = $value;
}
}
</code></pre></div></div>
<p>Security.</p>
<p>ReDOS, XSS, command injection, symlink attacks. Did I miss anything? Probably.</p>
<p>ReDOS: sanitized user input, length-limit search text, quote metacharacters
with <code class="language-plaintext highlighter-rouge">\Q$search_text\E</code>.</p>
<p>XSS: sanitized user input. Escaped HTML.</p>
<p>Command injection: no exec()/system() calls. Non-privileged user (www).</p>
<p>Symlink attacks: File::Find don’t follow symlinks (follow => 0). chroot.</p>
<p>Access controls: files (444), directories and CGI script: 554.</p>
<p>Verdict: O(n) speed. Works on every conceivable browser. Good enough.</p>
<p>Commit: <a href="https://git.asciimx.com/www/commit/?h=term&id=9fec793abe0a73e5cd502a1d1e935e2413b85079">9fec793</a></p>
</div>
<p class="post-author right">by W. D. Sadeep Madurange</p>
</div>
</div>
</main>
<div class="footer">
<div class="container">
<div class="twelve columns right container-2">
<p id="footer-text">© ASCIIMX - 2025</p>
</div>
</div>
</div>
</body>
</html>
|