summaryrefslogtreecommitdiffstats
path: root/_log/search-with-cgi.md
blob: 0109294499709d33232e73ca23c08bc022bf779c (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
---
title: Site search using Perl + CGI
date: 2025-12-29
layout: post
---

Number of articles on the site are growing. Need a way to search site.

Searching the RSS feed client-side using JavaScript is not an option. That
would make the feed much heavier and break the site for text-based web browsers
like Lynx.

Not gonna use an inverted index--More than an evening's effort, especially if I
want partial matching. I want partial matching.

Few lines of Perl could do a regex search and send the result back via CGI.
OpenBSD httpd speaks CGI. Perl and slowcgi are in the base system. No
dependencies.

Perl: traverse directory with File::Find. If search text is found grab the file
name, title and up to 50 chars from the first paragraph to include in the
search result.

```
find({
    wanted => sub {
        return unless -f $_ && $_ eq 'index.html';
        # ... file reading ...
        if ($content =~ /\Q$search_text\E/i) {
            # Extract title, snippet
            push @results, { 
                path    => $File::Find::name,
                title   => $title, 
                snippet => $snippet 
            };
        }
    },
    follow => 0,
}, $dir);
```

httpd sets the search text in QUERY_STRING env. Don't need Perl's CGI module.

```
my %params;
if ($ENV{QUERY_STRING}) {
    foreach my $pair (split /&/, $ENV{QUERY_STRING}) {
        my ($key, $value) = split /=/, $pair;
        $value =~ tr/+/ /;
        $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
        $params{$key} = $value;
    }
}
```

Security.

ReDOS, XSS, command injection, symlink attacks. Did I miss anything? Probably.

ReDOS: sanitized user input, length-limit search text, quote metacharacters
with `\Q$search_text\E`.

XSS: sanitized user input. Escaped HTML.

Command injection: no exec()/system() calls. Non-privileged user (www).

Symlink attacks: File::Find don't follow symlinks (follow => 0). chroot.

Access controls: files (444), directories and CGI script: 554.

Verdict: O(n) speed. Works on every conceivable browser. Good enough. 

Commit: [9fec793](https://git.asciimx.com/www/commit/?h=term&id=9fec793abe0a73e5cd502a1d1e935e2413b85079)