From 57ff09d2eefefa2462a2af0175e3e8164c7bc828 Mon Sep 17 00:00:00 2001 From: Sadeep Madurange Date: Mon, 5 Jan 2026 21:10:33 +0800 Subject: Sharpen Bumblebee post. --- _log/bumblebee.md | 46 ++++++++++++++++++++-------------------------- 1 file changed, 20 insertions(+), 26 deletions(-) (limited to '_log') diff --git a/_log/bumblebee.md b/_log/bumblebee.md index 588ae42..25f08e4 100644 --- a/_log/bumblebee.md +++ b/_log/bumblebee.md @@ -1,45 +1,39 @@ --- -title: "Bumblebee: browser automation" +title: 'Bumblebee: web script synthesizer' date: 2025-04-02 layout: post project: true thumbnail: thumb_sm.png --- -Built with Andy Zhang for an employer. Tool to automate web scraping script -generation. +Work project. Browser session-to-code conversion. -Manual script authoring took hours. Scripts poorly optimized, CPUs maxed -constantly, cloud costs excessive. +Architecture: C# WinForms host, embedded browser, code editor. Browser +extension rejected due to security policy and shallow event control. -Initially considered browser extension. Desktop app won—extensions don't give -deep event control. Company policy blocked extensions anyway. +Tool evaluation: -First prototype: C# Win Forms + CefSharp. + - CefSharp: Discarded. API lacked elegance. + - WebView2: Selected. Better WinForms integration. Hard dependency on + Microsoft Edge--acceptable for corporate Windows environments. -Second prototype: C# Win Forms + WebView2. Packaging and distribution more -complex, but the API is well-designed; integrates well with Win Forms. +Implementation: -Microsoft Edge required. Portability not a concern, only need to target -controlled Windows environments. Choosing WebView2 over CefSharp. + 1. Interception: Injected JS hooks; internal browser event monitoring + (pop-ups/downloads). + 2. Transformation: Event → Token → Instruction Table → String. + 3. Optimization: Parallel event/text lists processing; rendered + in Scintilla.NET -Embed Scintilla.NET editor for -overriding generated script. +Bug: Manual mid-session overrides desync code/event lists, bypassing optimizer. +Linear lists inadequate for state synchronization. Need to rethink data +structures; look to compiler Abstract Syntax Trees (AST) for intermediate +representation. -Code generation sequence: Inject JavaScript to intercept client-side events. -Capture internal browser events (pop-ups, file downloads). Event -raised → parsed into a token → insert to list → interpret event → look up -instruction from a table → form instruction with event args → insert text to a -parallel list → run both lists through optimizer → update Scintilla editor. - -Limitation: manual overriding via Scintilla editor mid-session causes the code -list to go out of sync with the event list. Optimizer can't handle this yet. - -Note to self: need to rethink the event/text list data structures in the -context of the optimizer--look to compilers for inspiration maybe? +Verdict: Serves its purpose. -- cgit v1.2.3