From 4c010f5b96e1326ff4f6eb4bc44e5e397c27d72f Mon Sep 17 00:00:00 2001 From: Sadeep Madurange Date: Fri, 24 Apr 2026 16:00:54 +0800 Subject: Improve Bumblebee post. --- _log/bumblebee.md | 41 +++++++++++++++++++++++++---------------- 1 file changed, 25 insertions(+), 16 deletions(-) diff --git a/_log/bumblebee.md b/_log/bumblebee.md index c9c15f8..39fba46 100644 --- a/_log/bumblebee.md +++ b/_log/bumblebee.md @@ -1,34 +1,43 @@ --- -title: Bumblebee +title: Built a browser session script synthesizer date: 2025-04-02 layout: post project: true thumbnail: thumb_sm.png --- -One year at trading firm—brittle ETL jobs, scripts max out CPUs, €0.4 mil/month -cloud costs. Budget cuts make hardware bailouts no longer viable. Asked to -restart stalling servers. +One year at trading firm. Webscraper jobs are giving too many problems. Old +scripts are prone to failure, new scripts take too long to write. -2025-02: Built Bumblebee to record browser sessions and synthesize scripts. +2025-02: Built Bumblebee, a C# WinForms application, to record browser sessions +and automate the synthesis of scripts. -Stack: C# WinForms, WebView2 (Edge), Scintilla.NET editor. +Hosted WebView2 (Edge) in the WinForms application to render web content. -Data flow: injected JS hooks + browser + editor → events → code → optimizer -→ editor. +Intercepted events by injecting JS hooks to web pages (client-side events) and +listening to WebView events (internal browser events). Converted intercepted +events to Selenium code by sending through if-else statements. Crude—no time +for something better. -Two linear lists store events and code. Mid-session manual edits desync lists, -block optimizer. ASTs are overkill. As a workaround, only edit scripts at the -end of recording. +Implemented a basic optimizer to squash event sequences into single commands +(e.g., calendar clicks → text input), use heuristics to improve DOM addressing +(xpath, id, element). -2025-03: First iteration shipped. Optimizer squashes event sequences into -single commands (e.g., calendar clicks → text input). Better DOM addressing -(xpath, id, element) using heuristics. Began work on key optimization: bypass -browser, grab data file directly when possible. +Integrated Scintilla.NET editor to allow user more control over the generated +script. -2025-04: Abandoned project—left the firm. +Events and code are stored in two linear lists. Mid-session manual edits desync +the lists, block the optimizer. ASTs are overkill for now. As a workaround, +only edit scripts at the end of recording. + +2025-03: Shipped the first iteration. Well-received by colleagues. + +CPUs already saturated. Systems stalling. Began work on key optimization: +bypass the browser, grab data files directly when possible. + +2025-04: Abandoned project. Left the firm. -- cgit v1.2.3