summaryrefslogtreecommitdiffstats
path: root/_log/bumblebee.md
diff options
context:
space:
mode:
authorSadeep Madurange <sadeep@asciimx.com>2025-12-27 17:36:09 +0800
committerSadeep Madurange <sadeep@asciimx.com>2025-12-27 18:36:05 +0800
commit15ab870972f3fbeeec24ae70f1eb2ad19bc0be11 (patch)
tree72cecf1e031b1bfcc3a615743e1b9be192705cd7 /_log/bumblebee.md
parent023ea62a08492d9dda680cee3e706a988b1ae6f0 (diff)
downloadwww-15ab870972f3fbeeec24ae70f1eb2ad19bc0be11.tar.gz
Bumblebee.
Diffstat (limited to '_log/bumblebee.md')
-rw-r--r--_log/bumblebee.md68
1 files changed, 31 insertions, 37 deletions
diff --git a/_log/bumblebee.md b/_log/bumblebee.md
index c26fa0d..74fc06b 100644
--- a/_log/bumblebee.md
+++ b/_log/bumblebee.md
@@ -6,46 +6,40 @@ project: true
thumbnail: thumb_sm.png
---
-Bumblebee is a tool I built for one of my employers to automate the generation
-of web scraping scripts.
+Built with Andy Zhang for an employer. Tool to automate web scraping script
+generation.
<video style="max-width:100%; margin-bottom: 10px" controls="" poster="poster.png">
<source src="bee.mp4" type="video/mp4">
</video>
-In 2024, we were tasked with collecting market data using various methods,
-including scraping data from authorized websites for traders' use.
-
-Manual authoring of such scripts took time. The scripts were often brittle due
-to the complexity of the modern web, and they lacked optimizations such as
-bypassing the UI and retrieving the data files directly when possible, which
-would have significantly reduced our compute costs.
-
-To alleviate these challenges, I, with the help of a colleague, Andy Zhang,
-built Bumblebee: a web browser powered by C# Windows Forms, Microsoft Edge <a
-src="https://developer.microsoft.com/en-us/microsoft-edge/webview2"
-class="external" target="_blank" rel="noopener noreferrer">WebView2</a>, and
-the <a src="https://github.com/desjarlais/Scintilla.NET" class="external"
-toarget="_blank" rel="noopener noreferrer">Scintilla.NET</a> text editor.
-
-Bumblebee works by injecting a custom JavaScript program that intercepts
-client-side events and sends them to Bumblebee for analysis. In addition to
-front-end events, Bumblebee also captures internal browser events, which it
-then interprets to generate code in real time. Note that we developed Bumblebee
-before the advent of now-popular LLMs. Bumblebee supports dynamic websites,
-pop-ups, developer tools, live manual override, event debouncing, and filtering
-hidden elements and scripts.
-
-Before settling on a desktop application, we contemplated designing Bumblebee
-as a browser extension. We chose the desktop app because extensions don't offer
-the deep, event-based control we needed. Besides, the company's security
-policy, which prohibited browser extensions, would have complicated the
-deployment of an extension-based solution. My first prototype used a C# binding
-of the Chromium project. WebView's more intuitive API and its seamless
-integration with Windows Forms led us to choose it over the Chromium wrapper.
-
-What began as a personal side project to improve my own workflow enabled us to
-collectively improve the quality of our web scripts at a much larger scale.
-Bumblebee predictably reduced the time we spent on authoring scripts from hours
-to a few minutes.
+Manual script authoring took hours. Scripts poorly optimized, CPUs maxed
+constantly, cloud costs excessive.
+
+Initially considered browser extension. Desktop app won—extensions don't give
+deep event control. Company policy blocked extensions anyway.
+
+First prototype: C# Win Forms + CefSharp.
+
+Second prototype: C# Win Forms + WebView2. Packaging and distribution more
+complex, but the API is well-designed; integrates well with Win Forms.
+
+Microsoft Edge required. Portability not a concern, only need to target
+controlled Windows environments. Choosing WebView2 over CefSharp.
+
+Embed <a href="https://github.com/desjarlais/Scintilla.NET" class="external"
+toarget="_blank" rel="noopener noreferrer">Scintilla.NET</a> editor for
+overriding generated script.
+
+Code generation sequence: Inject JavaScript to intercept client-side events.
+Capture internal browser events (pop-ups, file downloads). Event
+raised → parsed into a token → insert to list → interpret event → look up
+instruction from a table → form instruction with event args → insert text to a
+parallel list → run both lists through optimizer → update Scintilla editor.
+
+Problem: manual overriding via Scintilla editor mid-session causes the code
+list to go out of sync with the event list. Optimizer can't handle this yet.
+
+Note to self: need to rethink the event/text list data structures in the
+context of the optimizer--look to compilers for inspiration maybe?