summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorSadeep Madurange <sadeep@asciimx.com>2026-05-04 09:01:42 +0800
committerSadeep Madurange <sadeep@asciimx.com>2026-05-07 09:57:01 +0800
commit35c999ea426a5f0b32e2b5d5bab5fe010ea9ce97 (patch)
treef5365289bf1fc4d9d660b9699b3f60a790dfa1ed
parentfb15536326a8d5b3187981b0036df65f4bb60996 (diff)
downloadwww-35c999ea426a5f0b32e2b5d5bab5fe010ea9ce97.tar.gz
Improve accuracy of framing.
-rw-r--r--_log/bumblebee.md19
-rw-r--r--_log/etlas.md6
-rw-r--r--_log/neo4j-a-star-search.md8
-rw-r--r--_log/site-search.md92
-rw-r--r--_log/vcs-1.md5
-rw-r--r--index.md6
6 files changed, 66 insertions, 70 deletions
diff --git a/_log/bumblebee.md b/_log/bumblebee.md
index 21a9a07..bbe0490 100644
--- a/_log/bumblebee.md
+++ b/_log/bumblebee.md
@@ -1,16 +1,16 @@
---
-title: Built a browser session script synthesizer
+title: Built a browser automation script synthesizer
date: 2025-04-02
layout: post
project: true
thumbnail: thumb_sm.png
---
-One year at trading firm. Webscraper is giving too many problems. CPUs are
+One year at the trading firm. Webscrapers are causing problems. CPUs are
saturated, servers are stalling.
-2025-02: Built Bumblebee, a C# WinForms application, to record browser sessions
-and automate the synthesis of scripts.
+2025-02: Built a C# WinForms application to record browser sessions and
+automate the synthesis of scripts.
<video style="max-width:100%; margin-bottom: 10px" controls="" poster="poster.png">
<source src="bee.mp4" type="video/mp4">
@@ -20,8 +20,7 @@ Hosted WebView2 (Edge) in the WinForms application to render web content.
Intercepted events by injecting JS hooks to web pages (client-side events) and
listening to WebView events (internal browser events). Converted intercepted
-events to Selenium code by sending through if-else statements. Crude—no time
-for something better.
+events to Selenium code by sending through if-else blocks.
Implemented a basic optimizer to squash event sequences into single commands
(e.g., calendar clicks → text input), use heuristics to improve DOM addressing
@@ -30,11 +29,11 @@ Implemented a basic optimizer to squash event sequences into single commands
Integrated Scintilla.NET editor to allow user more control over the generated
script.
-Events and code are stored in two linear lists. Mid-session manual edits desync
-the lists, block the optimizer. ASTs are overkill for now. As a workaround,
-only edit scripts at the end of recording.
+Events and code are stored in two linear lists. Without ASTs, mid-session
+manual edits desync the lists, block the optimizer. As a workaround, only edit
+scripts at the end of recording.
-2025-03: Shipped the first iteration and began work on key optimization: bypass
+2025-03: Shipped the first iteration. Began work on a key optimization: bypass
the browser, grab data files directly when possible.
2025-04: Abandoned project. Left the firm.
diff --git a/_log/etlas.md b/_log/etlas.md
index 9fd5ded..54745f6 100644
--- a/_log/etlas.md
+++ b/_log/etlas.md
@@ -68,9 +68,9 @@ KB RAM. Deployed a simple Flask API on VPS to manage the watchlist and relay
the feed. Wrapped the API in FastCGI and exposed it through chroot-ed
htpasswd + slowcgi + httpd—battle-tested OpenBSD base-system tools.
-Rolled my own stepped graph for simplicity, but the code is hideous. Needed
-vTaskDelay() to prevent the watchdog timer from triggering. Will look into
-Bresenham’s in a future revision.
+Custom stepped graph works but the code is crude. vTaskDelay() is needed to
+keep the watchdog timer from triggering—revisit with Bresenham's line
+algorithm.
News: Used Channel NewsAsia RSS feed for news. Hand-coded the XML parsing in
C—no Flask backend at the time. Now that I have one for stocks, will move the
diff --git a/_log/neo4j-a-star-search.md b/_log/neo4j-a-star-search.md
index 8790731..6664421 100644
--- a/_log/neo4j-a-star-search.md
+++ b/_log/neo4j-a-star-search.md
@@ -1,13 +1,13 @@
---
-title: Contributed A* search to Neo4J
+title: Contributed A* search to Neo4J algorithms
date: 2018-03-06
layout: post
---
Written in 2026, backdated to 2018.
-Before v3.4.0, Neo4J shipped with Dijkstra's shortest path search. The
-algorithm was too slow for our marine vessel tracking application.
+Before v3.4.0, Neo4J algorithms plugin shipped with Dijkstra's shortest path
+search. The algorithm was too slow for our marine vessel tracking application.
Forked and added A* search. Used the haversine function to steer the search:
@@ -52,7 +52,7 @@ Upstreamed the changes.
GitHub release: <a
href="https://github.com/neo4j-contrib/neo4j-graph-algorithms/releases/tag/3.4.0.0"
-class="external" target="_blank" rel="noopener noreferrer">Neo4J v3.4.0</a> |
+class="external" target="_blank" rel="noopener noreferrer">neo4j-contrib v3.4.0</a> |
<a
href="https://github.com/neo4j-contrib/neo4j-graph-algorithms/blob/bd9732d9a690319552e134708692acb5a0d6b37c/algo/src/main/java/org/neo4j/graphalgo/impl/ShortestPathAStar.java"
class="external" target="_blank" rel="noopener noreferrer">Full source</a>.
diff --git a/_log/site-search.md b/_log/site-search.md
index 7b916fe..c1d4c12 100644
--- a/_log/site-search.md
+++ b/_log/site-search.md
@@ -1,5 +1,5 @@
---
-title: Overengineered search
+title: Under-engineered search
date: 2026-01-03
layout: post
---
@@ -8,15 +8,14 @@ Developed a suffix-array-based search engine for the site today. While a simple
regex search was enough, couldn't resist the technical elegance of a proper
index.
-Indexer: Implemented the indexer in Perl to crawl the HTML, lowercase the text,
-and encode it into UTF-8 bytes. Used a null byte sentinel to mark document
-boundaries and stored the lexicographically sorted 32-bit unsigned integer
-offsets to sa.bin:
+Indexer: Indexer crawls the HTML, lowercases the text, and encodes it into
+UTF-8 bytes. Null byte sentinel marks the document boundaries;
+Lexicographically sorted 32-bit unsigned integer offsets are stored in sa.bin:
```
my @sa = 0 .. (length($corpus) - 1);
{
- use bytes; # Force compare 8-bit Unicode value comparisons
+ use bytes; # Force compare raw bytes
@sa = sort {
# First 64 bytes check (fast path)
(substr($corpus, $a, 64) cmp substr($corpus, $b, 64)) ||
@@ -29,13 +28,8 @@ my @sa = 0 .. (length($corpus) - 1);
32-bit offsets provide a 4 GB ceiling—overkill for a personal site, but
comforting to have.
-It takes about 50ms to index my 12-entry website on a T490. As the site grows,
-the O(L⋅N log N) sort could become a bottleneck. So I introduced a fast path
-that caps L at 64 bytes—roughly the size of a cache line on common hardware.
-
-Search: Implemented the search in a FastCGI script as a textbook range query
-with two binary searches. Leveraged the fixed-width offsets for fast random
-access to the index:
+Search: Textbook range query with two binary searches hosted in a FastCGI
+process. Fixed-width offsets enable fast random access to the index:
```
seek($fh_sa, $mid * 4, 0);
@@ -45,9 +39,9 @@ seek($fh_cp, $off, 0);
read($fh_cp, my $text, $query_len);
```
-Chose seek + read over mmap because it outperformed mmap for <1k files. At
-10k, mmap was occasionally faster (~200 µs), but used more memory—possibly
-due to OpenBSD’s VM security trade-offs. Results may vary by OS.
+Seek + read outperformed mmap for <1k files. At 10k, mmap was occasionally
+faster (~200 µs), but consumed more memory—possibly due to OpenBSD’s VM
+security trade-offs. Results may vary by OS.
Benchmarked on T490 (i7-10510U, OpenBSD 7.8, article size: 16 KB) against
linear regex search:
@@ -55,38 +49,38 @@ linear regex search:
<pre class="pre-no-style">
=============================================================
SEARCH BENCHMARK: Suffix array vs. Linear regex
-ARTICLE SIZE: 16 KB
+ARTICLE SIZE: 8 KB
=============================================================
-500 files:
--------------------------------------------------------------
-METRIC | SA | REGEX
+500 files (Targeting: keyword_-1):
+----------------+----------------------+---------------------
+METRIC | SA | REGEX
+----------------+----------------------+---------------------
+Search time | 0.0014s | 0.0451s
+Peak RAM | 8124 KB | 9612 KB
+Indexing time | 18.1865s | N/A
+Index size | 19610.39 KB | N/A
+----------------+----------------------+---------------------
+
+1000 files (Targeting: keyword_-1):
----------------+----------------------+---------------------
-Search time | 0.0012s | 0.0407s
-Peak RAM | 8828 KB | 9136 KB
-Indexing time | 0.1475s | N/A
-Index size | 204.94 KB | N/A
--------------------------------------------------------------
-
-1,000 files:
--------------------------------------------------------------
-METRIC | SA | REGEX
+METRIC | SA | REGEX
----------------+----------------------+---------------------
-Search time | 0.0019s | 0.0795s
-Peak RAM | 8980 KB | 9460 KB
-Indexing time | 0.3101s | N/A
-Index size | 410.51 KB | N/A
--------------------------------------------------------------
-
-10,000 files:
--------------------------------------------------------------
-METRIC | SA | REGEX
+Search time | 0.0021s | 0.0918s
+Peak RAM | 8280 KB | 9960 KB
+Indexing time | 43.1748s | N/A
+Index size | 39225.06 KB | N/A
+----------------+----------------------+---------------------
+
+10000 files (Targeting: keyword_-1):
+----------------+----------------------+---------------------
+METRIC | SA | REGEX
+----------------+----------------------+---------------------
+Search time | 0.0173s | 1.1275s
+Peak RAM | 11848 KB | 13392 KB
+Indexing time | 663.3909s | N/A
+Index size | 392263.01 KB | N/A
----------------+----------------------+---------------------
-Search time | 0.0161s | 0.9120s
-Peak RAM | 12504 KB | 12804 KB
-Indexing time | 10.9661s | N/A
-Index size | 4163.44 KB | N/A
--------------------------------------------------------------
</pre>
Security: httpd, slowcgi, Perl are in the OpenBSD base system. Used file system
@@ -96,13 +90,17 @@ Resource exhaustion and XSS attacks are inherent. Limited concurrent searches
using lock-file semaphores, and capped the query length (64 B) and the result
set (20). Mitigated XSS by HTML-escaping all output using HTML::Escape.
-At six articles a year, this should work for the next 1600 years. Penciled in
-the next release for Anno Domini 3626.
+Performance: Without SA-IS, indexing is slow. With O(L⋅N log N) naive sort, 100
+8 KB articles took 6.58 minutes to index. L=64 fast path reduces that to 2.69
+seconds (L=16, 32, 64: 2.68-2.69s; 128, 256: 2.75-2.77s). Even so, 43.1748s to
+index 500 articles is untenable.
+
+I under-engineered search.
Commit: <a
href="https://git.asciimx.com/www/commit/?h=term&id=6da102d6e0494a3eac3f05fa3b2cdcc25ba2754e"
class="external" target="_blank" rel="noopener noreferrer">6da102d</a> |
Benchmarks: <a
-href="https://git.asciimx.com/site-search-bm/commit/?id=8a4da6809cf9368cd6a5dd7351181ea4256453f9"
-class="external" target="_blank" rel="noopener noreferrer">8a4da68</a>
+href="https://git.asciimx.com/site-search-bm/commit/?id=de9d82e8074c9b67a04989f9b6be62890b7c95bb"
+class="external" target="_blank" rel="noopener noreferrer">de9d82e</a>
diff --git a/_log/vcs-1.md b/_log/vcs-1.md
index f7520df..ac74ad5 100644
--- a/_log/vcs-1.md
+++ b/_log/vcs-1.md
@@ -113,10 +113,7 @@ from 1,300 to 1,462.
Then fell the GC hammer. Inodes: 41. Space recovered: 8.4 MB.
-Urn's sequential IO and reduced write frequency are theoretically gentler on
-NAND. Git's dramatic GC pass (12 MB → 3.8 MB) incurs SSD wear Urn likely
-avoids. Precise impact on TBW and write amplification, however, remains
-unknown.
+Precise impact on TBW and write amplification remains unknown.
Commit: <a
href="https://git.asciimx.com/urn/commit/?id=79d9ec2bdef0a82172fa0aa56f12004bef206c04"
diff --git a/index.md b/index.md
index 4ae019e..4e36276 100644
--- a/index.md
+++ b/index.md
@@ -14,6 +14,8 @@ title: "Home"
</ul>
<footer>
- <p>A journal of personal projects and experiments. <a href="/cgi-bin/find.cgi">Search</a></p>
- <p>Built with <a href="https://github.com/ronv/minimalist" class="external" target="_blank" rel="noopener noreferrer">Minimalist</a></p>
+ <p>Built with <a href="https://github.com/ronv/minimalist" class="external"
+ target="_blank" rel="noopener noreferrer">Minimalist</a>.
+ <a href="/cgi-bin/find.cgi">Search</a>
+ </p>
</footer>