1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
|
---
title: Site search using Perl + CGI
date: 2025-12-29
layout: post
---
Number of articles on the site are growing. Need a way to search site.
Searching the RSS feed client-side using JavaScript is not an option. That
would make the feed much heavier and break the site for text-based web browsers
like Lynx.
Not gonna use an inverted index--More than an evening's effort, especially if I
want partial matching. I want partial matching.
Few lines of Perl could do a regex search and send the result back via CGI.
OpenBSD httpd speaks CGI. Perl and slowcgi are in the base system. No
dependencies.
Perl: traverse directory with File::Find. If search text is found grab the file
name, title and up to 50 chars from the first paragraph to include in the
search result.
```
find({
wanted => sub {
return unless -f $_ && $_ eq 'index.html';
# ... file reading ...
if ($content =~ /\Q$search_text\E/i) {
# Extract title, snippet
push @results, {
path => $File::Find::name,
title => $title,
snippet => $snippet
};
}
},
follow => 0,
}, $dir);
```
httpd sets the search text in QUERY_STRING env. Don't need Perl's CGI module.
```
my %params;
if ($ENV{QUERY_STRING}) {
foreach my $pair (split /&/, $ENV{QUERY_STRING}) {
my ($key, $value) = split /=/, $pair;
$value =~ tr/+/ /;
$value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
$params{$key} = $value;
}
}
```
Security.
ReDOS, XSS, command injection, symlink attacks. Did I miss anything? Probably.
ReDOS: sanitized user input, length-limit search text, quote metacharacters
with `\Q$search_text\E`.
XSS: sanitized user input. Escaped HTML.
Command injection: no exec()/system() calls. Non-privileged user (www).
Symlink attacks: File::Find don't follow symlinks (follow => 0). chroot.
Access controls: files (444), directories and CGI script: 554.
Verdict: O(n) speed. Works on every conceivable browser. Good enough.
Commit: [9fec793](https://git.asciimx.com/www/commit/?h=term&id=9fec793abe0a73e5cd502a1d1e935e2413b85079)
|