From 15ab870972f3fbeeec24ae70f1eb2ad19bc0be11 Mon Sep 17 00:00:00 2001 From: Sadeep Madurange Date: Sat, 27 Dec 2025 17:36:09 +0800 Subject: Bumblebee. --- _site/feed.xml | 2 +- _site/log/bumblebee/index.html | 64 ++++++++++++++++++++---------------------- _site/posts.xml | 2 +- 3 files changed, 32 insertions(+), 36 deletions(-) (limited to '_site') diff --git a/_site/feed.xml b/_site/feed.xml index de57fda..eee3cb4 100644 --- a/_site/feed.xml +++ b/_site/feed.xml @@ -1 +1 @@ -Jekyll2025-12-27T13:04:53+08:00/feed.xmlASCIIMX | LogW. D. Sadeep MadurangeMatrix Rain: 2025 refactor2025-12-21T00:00:00+08:002025-12-21T00:00:00+08:00/log/matrix-digital-rainW. D. Sadeep MadurangeFingerprint door lock (LP)2025-08-18T00:00:00+08:002025-08-18T00:00:00+08:00/log/fpm-door-lock-lpW. D. Sadeep MadurangeHigh-side MOSFET switching2025-06-22T00:00:00+08:002025-06-22T00:00:00+08:00/log/mosfet-switchesW. D. Sadeep MadurangeATmega328P at 3.3V and 5V2025-06-10T00:00:00+08:002025-06-10T00:00:00+08:00/log/arduino-unoW. D. Sadeep MadurangeFingerprint door lock (RF)2025-06-05T00:00:00+08:002025-06-05T00:00:00+08:00/log/fpm-door-lock-rfW. D. Sadeep MadurangeBumblebee: browser automation2025-04-02T00:00:00+08:002025-04-02T00:00:00+08:00/log/bumblebeeW. D. Sadeep MadurangeATSAM3X8E bare-metal programming2024-09-16T00:00:00+08:002024-09-16T00:00:00+08:00/log/arduino-dueW. D. Sadeep MadurangeEtlas: e-paper dashboard2024-09-05T00:00:00+08:002024-09-05T00:00:00+08:00/log/etlasW. D. Sadeep MadurangeExperimental e-reader2023-10-24T00:00:00+08:002023-10-24T00:00:00+08:00/log/e-readerW. D. Sadeep MadurangeNeo4J A* search2018-03-06T00:00:00+08:002018-03-06T00:00:00+08:00/log/neo4j-a-star-searchW. D. Sadeep Madurange \ No newline at end of file +Jekyll2025-12-27T18:35:47+08:00/feed.xmlASCIIMX | LogW. D. Sadeep MadurangeMatrix Rain: 2025 refactor2025-12-21T00:00:00+08:002025-12-21T00:00:00+08:00/log/matrix-digital-rainW. D. Sadeep MadurangeFingerprint door lock (LP)2025-08-18T00:00:00+08:002025-08-18T00:00:00+08:00/log/fpm-door-lock-lpW. D. Sadeep MadurangeHigh-side MOSFET switching2025-06-22T00:00:00+08:002025-06-22T00:00:00+08:00/log/mosfet-switchesW. D. Sadeep MadurangeATmega328P at 3.3V and 5V2025-06-10T00:00:00+08:002025-06-10T00:00:00+08:00/log/arduino-unoW. D. Sadeep MadurangeFingerprint door lock (RF)2025-06-05T00:00:00+08:002025-06-05T00:00:00+08:00/log/fpm-door-lock-rfW. D. Sadeep MadurangeBumblebee: browser automation2025-04-02T00:00:00+08:002025-04-02T00:00:00+08:00/log/bumblebeeW. D. Sadeep MadurangeATSAM3X8E bare-metal programming2024-09-16T00:00:00+08:002024-09-16T00:00:00+08:00/log/arduino-dueW. D. Sadeep MadurangeEtlas: e-paper dashboard2024-09-05T00:00:00+08:002024-09-05T00:00:00+08:00/log/etlasW. D. Sadeep MadurangeExperimental e-reader2023-10-24T00:00:00+08:002023-10-24T00:00:00+08:00/log/e-readerW. D. Sadeep MadurangeNeo4J A* search2018-03-06T00:00:00+08:002018-03-06T00:00:00+08:00/log/neo4j-a-star-searchW. D. Sadeep Madurange \ No newline at end of file diff --git a/_site/log/bumblebee/index.html b/_site/log/bumblebee/index.html index 7fabd41..2aebc2f 100644 --- a/_site/log/bumblebee/index.html +++ b/_site/log/bumblebee/index.html @@ -44,45 +44,41 @@

BUMBLEBEE: BROWSER AUTOMATION

02 APRIL 2025

-

Bumblebee is a tool I built for one of my employers to automate the generation -of web scraping scripts.

+

Built with Andy Zhang for an employer. Tool to automate web scraping script +generation.

-

In 2024, we were tasked with collecting market data using various methods, -including scraping data from authorized websites for traders’ use.

- -

Manual authoring of such scripts took time. The scripts were often brittle due -to the complexity of the modern web, and they lacked optimizations such as -bypassing the UI and retrieving the data files directly when possible, which -would have significantly reduced our compute costs.

- -

To alleviate these challenges, I, with the help of a colleague, Andy Zhang, -built Bumblebee: a web browser powered by C# Windows Forms, Microsoft Edge WebView2, and -the Scintilla.NET text editor.

- -

Bumblebee works by injecting a custom JavaScript program that intercepts -client-side events and sends them to Bumblebee for analysis. In addition to -front-end events, Bumblebee also captures internal browser events, which it -then interprets to generate code in real time. Note that we developed Bumblebee -before the advent of now-popular LLMs. Bumblebee supports dynamic websites, -pop-ups, developer tools, live manual override, event debouncing, and filtering -hidden elements and scripts.

- -

Before settling on a desktop application, we contemplated designing Bumblebee -as a browser extension. We chose the desktop app because extensions don’t offer -the deep, event-based control we needed. Besides, the company’s security -policy, which prohibited browser extensions, would have complicated the -deployment of an extension-based solution. My first prototype used a C# binding -of the Chromium project. WebView’s more intuitive API and its seamless -integration with Windows Forms led us to choose it over the Chromium wrapper.

- -

What began as a personal side project to improve my own workflow enabled us to -collectively improve the quality of our web scripts at a much larger scale. -Bumblebee predictably reduced the time we spent on authoring scripts from hours -to a few minutes.

+

Manual script authoring took hours. Scripts poorly optimized, CPUs maxed +constantly, cloud costs excessive.

+ +

Initially considered browser extension. Desktop app won—extensions don’t give +deep event control. Company policy blocked extensions anyway.

+ +

First prototype: C# Win Forms + CefSharp.

+ +

Second prototype: C# Win Forms + WebView2. Packaging and distribution more +complex, but the API is well-designed; integrates well with Win Forms.

+ +

Microsoft Edge required. Portability not a concern, only need to target +controlled Windows environments. Choosing WebView2 over CefSharp.

+ +

Embed Scintilla.NET editor for +overriding generated script.

+ +

Code generation sequence: Inject JavaScript to intercept client-side events. +Capture internal browser events (pop-ups, file downloads). Event +raised → parsed into a token → insert to list → interpret event → look up +instruction from a table → form instruction with event args → insert text to a +parallel list → run both lists through optimizer → update Scintilla editor.

+ +

Problem: manual overriding via Scintilla editor mid-session causes the code +list to go out of sync with the event list. Optimizer can’t handle this yet.

+ +

Note to self: need to rethink the event/text list data structures in the +context of the optimizer–look to compilers for inspiration maybe?

diff --git a/_site/posts.xml b/_site/posts.xml index 043fc0d..c92221f 100644 --- a/_site/posts.xml +++ b/_site/posts.xml @@ -1 +1 @@ -Jekyll2025-12-27T13:04:53+08:00/posts.xmlASCIIMXW. D. Sadeep Madurange \ No newline at end of file +Jekyll2025-12-27T18:35:47+08:00/posts.xmlASCIIMXW. D. Sadeep Madurange \ No newline at end of file -- cgit v1.2.3