diff options
| author | Sadeep Madurange <sadeep@asciimx.com> | 2026-04-24 16:00:54 +0800 |
|---|---|---|
| committer | Sadeep Madurange <sadeep@asciimx.com> | 2026-04-24 16:02:25 +0800 |
| commit | 4c010f5b96e1326ff4f6eb4bc44e5e397c27d72f (patch) | |
| tree | 0f40d2dbefc375bd0751df9d9c72d97aa3e05c8a /_log/bumblebee.md | |
| parent | 7ced3600d95d622d0543730aeae90d5ced335f03 (diff) | |
| download | www-4c010f5b96e1326ff4f6eb4bc44e5e397c27d72f.tar.gz | |
Improve Bumblebee post.
Diffstat (limited to '_log/bumblebee.md')
| -rw-r--r-- | _log/bumblebee.md | 41 |
1 files changed, 25 insertions, 16 deletions
diff --git a/_log/bumblebee.md b/_log/bumblebee.md index c9c15f8..39fba46 100644 --- a/_log/bumblebee.md +++ b/_log/bumblebee.md @@ -1,34 +1,43 @@ --- -title: Bumblebee +title: Built a browser session script synthesizer date: 2025-04-02 layout: post project: true thumbnail: thumb_sm.png --- -One year at trading firm—brittle ETL jobs, scripts max out CPUs, €0.4 mil/month -cloud costs. Budget cuts make hardware bailouts no longer viable. Asked to -restart stalling servers. +One year at trading firm. Webscraper jobs are giving too many problems. Old +scripts are prone to failure, new scripts take too long to write. -2025-02: Built Bumblebee to record browser sessions and synthesize scripts. +2025-02: Built Bumblebee, a C# WinForms application, to record browser sessions +and automate the synthesis of scripts. <video style="max-width:100%; margin-bottom: 10px" controls="" poster="poster.png"> <source src="bee.mp4" type="video/mp4"> </video> -Stack: C# WinForms, WebView2 (Edge), Scintilla.NET editor. +Hosted WebView2 (Edge) in the WinForms application to render web content. -Data flow: injected JS hooks + browser + editor → events → code → optimizer -→ editor. +Intercepted events by injecting JS hooks to web pages (client-side events) and +listening to WebView events (internal browser events). Converted intercepted +events to Selenium code by sending through if-else statements. Crude—no time +for something better. -Two linear lists store events and code. Mid-session manual edits desync lists, -block optimizer. ASTs are overkill. As a workaround, only edit scripts at the -end of recording. +Implemented a basic optimizer to squash event sequences into single commands +(e.g., calendar clicks → text input), use heuristics to improve DOM addressing +(xpath, id, element). -2025-03: First iteration shipped. Optimizer squashes event sequences into -single commands (e.g., calendar clicks → text input). Better DOM addressing -(xpath, id, element) using heuristics. Began work on key optimization: bypass -browser, grab data file directly when possible. +Integrated Scintilla.NET editor to allow user more control over the generated +script. -2025-04: Abandoned project—left the firm. +Events and code are stored in two linear lists. Mid-session manual edits desync +the lists, block the optimizer. ASTs are overkill for now. As a workaround, +only edit scripts at the end of recording. + +2025-03: Shipped the first iteration. Well-received by colleagues. + +CPUs already saturated. Systems stalling. Began work on key optimization: +bypass the browser, grab data files directly when possible. + +2025-04: Abandoned project. Left the firm. |
