diff options
Diffstat (limited to '_log/bumblebee.md')
| -rw-r--r-- | _log/bumblebee.md | 51 |
1 files changed, 51 insertions, 0 deletions
diff --git a/_log/bumblebee.md b/_log/bumblebee.md new file mode 100644 index 0000000..c26fa0d --- /dev/null +++ b/_log/bumblebee.md @@ -0,0 +1,51 @@ +--- +title: "Bumblebee: browser automation" +date: 2025-04-02 +layout: post +project: true +thumbnail: thumb_sm.png +--- + +Bumblebee is a tool I built for one of my employers to automate the generation +of web scraping scripts. + +<video style="max-width:100%; margin-bottom: 10px" controls="" poster="poster.png"> + <source src="bee.mp4" type="video/mp4"> +</video> + +In 2024, we were tasked with collecting market data using various methods, +including scraping data from authorized websites for traders' use. + +Manual authoring of such scripts took time. The scripts were often brittle due +to the complexity of the modern web, and they lacked optimizations such as +bypassing the UI and retrieving the data files directly when possible, which +would have significantly reduced our compute costs. + +To alleviate these challenges, I, with the help of a colleague, Andy Zhang, +built Bumblebee: a web browser powered by C# Windows Forms, Microsoft Edge <a +src="https://developer.microsoft.com/en-us/microsoft-edge/webview2" +class="external" target="_blank" rel="noopener noreferrer">WebView2</a>, and +the <a src="https://github.com/desjarlais/Scintilla.NET" class="external" +toarget="_blank" rel="noopener noreferrer">Scintilla.NET</a> text editor. + +Bumblebee works by injecting a custom JavaScript program that intercepts +client-side events and sends them to Bumblebee for analysis. In addition to +front-end events, Bumblebee also captures internal browser events, which it +then interprets to generate code in real time. Note that we developed Bumblebee +before the advent of now-popular LLMs. Bumblebee supports dynamic websites, +pop-ups, developer tools, live manual override, event debouncing, and filtering +hidden elements and scripts. + +Before settling on a desktop application, we contemplated designing Bumblebee +as a browser extension. We chose the desktop app because extensions don't offer +the deep, event-based control we needed. Besides, the company's security +policy, which prohibited browser extensions, would have complicated the +deployment of an extension-based solution. My first prototype used a C# binding +of the Chromium project. WebView's more intuitive API and its seamless +integration with Windows Forms led us to choose it over the Chromium wrapper. + +What began as a personal side project to improve my own workflow enabled us to +collectively improve the quality of our web scripts at a much larger scale. +Bumblebee predictably reduced the time we spent on authoring scripts from hours +to a few minutes. + |
