1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
|
---
title: "Bumblebee: browser automation"
date: 2025-04-02
thumbnail: thumb.png
layout: post
---
Bumblebee is a tool I built for one of my employers to automate the generation
of web scraping scripts.
<video style="max-width:100%; margin-bottom: 10px" controls="" poster="thumb.png">
<source src="bee.mp4" type="video/mp4">
</video>
In 2024, we were tasked with collecting market data using various methods,
including scraping data from authorized websites for traders' use.
Manual authoring of such scripts took time. The scripts were often brittle due
to the complex nature of modern websites, and they lacked optimizations such as
bypassing the UI and retrieving the data files directly when possible, which
would have significantly reduced our compute costs.
To alleviate these challenges, I, with the help of a colleague, Andy Zhang,
built Bumblebee: a C# Windows Forms desktop application that uses Microsoft
Edge <a src="https://developer.microsoft.com/en-us/microsoft-edge/webview2"
class="external" target="_blank" rel="noopener noreferrer">WebView2</a> for
rendering web content.
Bumblebee works by injecting a custom JavaScript program that intercepts
client-side events and sends them to Bumblebee for analysis. In addition to
front-end events, Bumblebee also captures internal browser events, which it
then interprets to generate code in real time. Note that we developed Bumblebee
before the advent of now-popular LLMs. Bumblebee reliably handles dynamic
websites and pop-ups. The user can access developer tools, override any part of
the script at any point during the session (using the embedded <a
src="https://github.com/desjarlais/Scintilla.NET" class="external"
target="_blank" rel="noopener noreferrer">Scintilla.NET</a> editor), debounce
events, and block hidden elements and scripts.
Before settling on a desktop application, we contemplated a browser extension.
We decided against that because we didn't want the browser vendor to dictate
Bumblebee's capabilities. Furthermore, the company's security policy prohibited
browser extensions, complicating its deployment. The initial prototype used a
C# wrapper of the Chromium project instead of WebView. WebView's more intuitive
API and its seamless integration with Windows Forms led us to choose it over
the Chromium wrapper.
Bumblebee reduced the time we spent on authoring scripts from hours to a few
minutes. Since the rules for code generation were written and optimized by
experts in web technologies, the output was more robust.
|