From 0140adc68d7c46f98658c5dbc51b53ff332da4ef Mon Sep 17 00:00:00 2001 From: Sadeep Madurange Date: Thu, 7 May 2026 19:19:44 +0800 Subject: Improve prose. --- _log/bumblebee.md | 34 +++++++++++++--------------------- _log/fpm-door-lock-rf.md | 2 +- _log/site-search.md | 39 ++++++++++++++++++++++----------------- _log/vcs-1.md | 21 ++++++++++----------- 4 files changed, 46 insertions(+), 50 deletions(-) diff --git a/_log/bumblebee.md b/_log/bumblebee.md index bbe0490..a5e761d 100644 --- a/_log/bumblebee.md +++ b/_log/bumblebee.md @@ -1,40 +1,32 @@ --- -title: Built a browser automation script synthesizer +title: Built a web script synthesizer date: 2025-04-02 layout: post project: true thumbnail: thumb_sm.png --- -One year at the trading firm. Webscrapers are causing problems. CPUs are -saturated, servers are stalling. +One year at the trading firm. Scripts are saturating CPUs, stalling servers; +Forced to restart them. -2025-02: Built a C# WinForms application to record browser sessions and -automate the synthesis of scripts. +2025-02: Built a tool to record browser sessions and synthesize better scripts. -Hosted WebView2 (Edge) in the WinForms application to render web content. +Stack: C# WinForms, WebView2 (Edge), Scintilla.NET editor. -Intercepted events by injecting JS hooks to web pages (client-side events) and -listening to WebView events (internal browser events). Converted intercepted -events to Selenium code by sending through if-else blocks. +Injected JS hooks, WebView2 and the editor generate events. If-else blocks +convert them to Selenium code. Optimizer squashes multiple events into single +commands (e.g., calendar clicks → text input), uses heuristics to improve DOM +addressing (xpath, id, element). -Implemented a basic optimizer to squash event sequences into single commands -(e.g., calendar clicks → text input), use heuristics to improve DOM addressing -(xpath, id, element). +Two linear lists store events and code—no ASTs. Mid-session manual edits desync +lists, block optimizer. Workaround: only edit scripts at the end of recording. -Integrated Scintilla.NET editor to allow user more control over the generated -script. - -Events and code are stored in two linear lists. Without ASTs, mid-session -manual edits desync the lists, block the optimizer. As a workaround, only edit -scripts at the end of recording. - -2025-03: Shipped the first iteration. Began work on a key optimization: bypass -the browser, grab data files directly when possible. +2025-03: Shipped the first iteration. Began work on key optimization: bypass +the browser, grab data files directly. 2025-04: Abandoned project. Left the firm. diff --git a/_log/fpm-door-lock-rf.md b/_log/fpm-door-lock-rf.md index 86ed5d3..1688a7d 100644 --- a/_log/fpm-door-lock-rf.md +++ b/_log/fpm-door-lock-rf.md @@ -12,7 +12,7 @@ Wanted to unlock the door with fingerprint, wirelessly to avoid drilling. lines of the transceivers to UART RXD/TXDs of the MCUs. Unreliable—constant packet loss. -2025-01: Switched to RFM69 modules. Complete ball-ache to program. Followed the +2025-01: Switched to RFM69 modules. Ball-ache to program. Followed the datasheet as well as I could, audited the code multiple times, cross-checked with RadioHead and RFM69 drivers. No luck. diff --git a/_log/site-search.md b/_log/site-search.md index e25c0fc..d4d7fe4 100644 --- a/_log/site-search.md +++ b/_log/site-search.md @@ -4,13 +4,13 @@ date: 2026-01-03 layout: post --- -Developed a suffix-array-based search engine my personal site. While a simple -regex search was enough, couldn't resist the technical elegance of a proper +Developed a suffix-array-based search engine. While a simple regex search +would've been sufficient, couldn't resist the technical elegance of a proper index. -Indexer: Indexer crawls the HTML, lowercases the text, and encodes it into -UTF-8 bytes. Null byte sentinel marks the document boundaries; -Lexicographically sorted 32-bit unsigned integer offsets are stored in sa.bin: +Indexer crawls the HTML, lowercases the text, and encodes it into UTF-8 bytes. +Null byte sentinels mark document boundaries; sa.bin stores lexicographically +sorted 32-bit unsigned integer offsets: ``` my @sa = 0 .. (length($corpus) - 1); @@ -28,12 +28,14 @@ my @sa = 0 .. (length($corpus) - 1); 32-bit offsets provide a 4 GB ceiling—overkill for a personal site, but comforting to have. -O(L⋅N log N) sort is slow. 100 4.1 KB articles took 97.9s to index. L=64 fast -path reduces that to 1.31s (L=16, 32, 64: 1.29-1.31s; 128, 256: 1.33-1.35s). -Even with fast path optimization, indexer is unusable beyond 300 articles. +O(L⋅N log N) sort is the bottleneck. 100 4.1 KB articles took 97.9s to index. +L=64 fast path reduces that to 1.31s. Experimented with 16, 32, 128, and 256 +bytes; 64 was the sweet spot—lower values were marginally faster, higher ones +marginally slower. -Search: Textbook range query with two binary searches, hosted in a FastCGI -process. Fixed-width offsets allow fast random access to the index: +Implemented search using a textbook range query with two binary searches, +hosted in a FastCGI process. Fixed-width offsets allow fast random access to +the index: ``` seek($fh_sa, $mid * 4, 0); @@ -47,6 +49,13 @@ Seek + read outperformed mmap for <1k files. At 10k, mmap was occasionally faster (~200 µs), but consumed more memory—possibly due to OpenBSD’s VM security trade-offs. Results may vary by OS. +Security: httpd, slowcgi, Perl are in the OpenBSD base system. Used file system +permissions to govern access. Hardened the system by running it in chroot. + +Resource exhaustion and XSS attacks are inherent. Limited concurrent searches +using lock-file semaphores, and capped the query length (64 B) and the result +set (20). Mitigated XSS by HTML-escaping all output using HTML::Escape. + Benchmarks: My articles have a 3.42 KB median, 3.43 KB mean, and 5.39 KB max. Benchmarked on T490 (i7-10510U, OpenBSD 7.8, article size: 4.1 KB) against linear regex search: @@ -88,14 +97,10 @@ Index size | 103557.18 KB | N/A ----------------+----------------------+--------------------- -Security: httpd, slowcgi, Perl are in the OpenBSD base system. Used file system -permissions to govern access. Hardened the system by running it in chroot. - -Resource exhaustion and XSS attacks are inherent. Limited concurrent searches -using lock-file semaphores, and capped the query length (64 B) and the result -set (20). Mitigated XSS by HTML-escaping all output using HTML::Escape. +Search scales well—0.9 ms at 100 files, 8.8 ms at 5000. Indexing doesn't. 4.5s +at 300 files is tolerable; 138s at 5000 is impractical. -Next release: Incremental indexing + SA-IS, Anno Domini 2076. +Warranty: 300 / 6 → 50 years. Commit: