diff options
| author | Sadeep Madurange <sadeep@asciimx.com> | 2026-01-05 21:10:33 +0800 |
|---|---|---|
| committer | Sadeep Madurange <sadeep@asciimx.com> | 2026-01-06 06:58:29 +0800 |
| commit | 57ff09d2eefefa2462a2af0175e3e8164c7bc828 (patch) | |
| tree | 1358bb9e9a4ff7f1015fa4439a7e1c1312ebe5f5 /_log | |
| parent | a6440c00abbc30230f8a59c737e4ec55cb82a350 (diff) | |
| download | www-57ff09d2eefefa2462a2af0175e3e8164c7bc828.tar.gz | |
Sharpen Bumblebee post.
Diffstat (limited to '_log')
| -rw-r--r-- | _log/bumblebee.md | 46 |
1 files changed, 20 insertions, 26 deletions
diff --git a/_log/bumblebee.md b/_log/bumblebee.md index 588ae42..25f08e4 100644 --- a/_log/bumblebee.md +++ b/_log/bumblebee.md @@ -1,45 +1,39 @@ --- -title: "Bumblebee: browser automation" +title: 'Bumblebee: web script synthesizer' date: 2025-04-02 layout: post project: true thumbnail: thumb_sm.png --- -Built with Andy Zhang for an employer. Tool to automate web scraping script -generation. +Work project. Browser session-to-code conversion. <video style="max-width:100%; margin-bottom: 10px" controls="" poster="poster.png"> <source src="bee.mp4" type="video/mp4"> </video> -Manual script authoring took hours. Scripts poorly optimized, CPUs maxed -constantly, cloud costs excessive. +Architecture: C# WinForms host, embedded browser, code editor. Browser +extension rejected due to security policy and shallow event control. -Initially considered browser extension. Desktop app won—extensions don't give -deep event control. Company policy blocked extensions anyway. +Tool evaluation: -First prototype: C# Win Forms + CefSharp. + - CefSharp: Discarded. API lacked elegance. + - WebView2: Selected. Better WinForms integration. Hard dependency on + Microsoft Edge--acceptable for corporate Windows environments. -Second prototype: C# Win Forms + WebView2. Packaging and distribution more -complex, but the API is well-designed; integrates well with Win Forms. +Implementation: -Microsoft Edge required. Portability not a concern, only need to target -controlled Windows environments. Choosing WebView2 over CefSharp. + 1. Interception: Injected JS hooks; internal browser event monitoring + (pop-ups/downloads). + 2. Transformation: Event → Token → Instruction Table → String. + 3. Optimization: Parallel event/text lists processing; rendered + in <a href="https://github.com/desjarlais/Scintilla.NET" class="external" + target="_blank" rel="noopener noreferrer">Scintilla.NET</a> -Embed <a href="https://github.com/desjarlais/Scintilla.NET" class="external" -toarget="_blank" rel="noopener noreferrer">Scintilla.NET</a> editor for -overriding generated script. +Bug: Manual mid-session overrides desync code/event lists, bypassing optimizer. +Linear lists inadequate for state synchronization. Need to rethink data +structures; look to compiler Abstract Syntax Trees (AST) for intermediate +representation. -Code generation sequence: Inject JavaScript to intercept client-side events. -Capture internal browser events (pop-ups, file downloads). Event -raised → parsed into a token → insert to list → interpret event → look up -instruction from a table → form instruction with event args → insert text to a -parallel list → run both lists through optimizer → update Scintilla editor. - -Limitation: manual overriding via Scintilla editor mid-session causes the code -list to go out of sync with the event list. Optimizer can't handle this yet. - -Note to self: need to rethink the event/text list data structures in the -context of the optimizer--look to compilers for inspiration maybe? +Verdict: Serves its purpose. |
