summaryrefslogtreecommitdiffstats
path: root/_projects/bumblebee.md
blob: b322969eb18fa82ecd9813e40c6d04703ccfdd47 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
---
title: "Bumblebee: browser automation"
date: 2025-04-02
thumbnail: thumb.png
layout: post
---

Bumblebee is a tool I built for one of my employers to automate the generation
of web scraping scripts.

<video style="max-width:100%; margin-bottom: 10px" controls="" poster="thumb.png">
  <source src="bee.mp4" type="video/mp4">
</video>

In 2024, we were tasked with collecting market data using various methods,
including scraping data from authorized websites for traders' use.

Manual authoring of such scripts took time. The scripts were often brittle due
to the complex nature of modern websites, and they lacked optimizations such as
bypassing the UI and retrieving the data files directly when possible, which
would have significantly reduced our compute costs.

To alleviate these challenges, I, with the help of a colleague, Andy Zhang,
built Bumblebee: a C# Windows Forms desktop application that uses Microsoft
Edge <a src="https://developer.microsoft.com/en-us/microsoft-edge/webview2"
class="external" target="_blank" rel="noopener noreferrer">WebView2</a> for
rendering web content.

Bumblebee works by injecting a custom JavaScript program that intercepts
client-side events and sends them to Bumblebee for analysis. In addition to
front-end events, Bumblebee also captures internal browser events, which it
then interprets to generate code in real time. Note that we developed Bumblebee
before the advent of now-popular LLMs. Bumblebee reliably handles dynamic
websites and pop-ups. The user can access developer tools, override any part of
the script at any point during the session (using the embedded <a
src="https://github.com/desjarlais/Scintilla.NET" class="external"
target="_blank" rel="noopener noreferrer">Scintilla.NET</a> editor), debounce
events, and block hidden elements and scripts.

Before settling on a desktop application, we contemplated designing Bumblebee
as a browser extension. We decided against that because we didn't want the
browser vendors to dictate Bumblebee's capabilities. Besides, the company's
security policy, which prohibited browser extensions, would have complicated
the deployment of an extension-based solution. The initial prototype used a C#
wrapper of the Chromium project instead of WebView. WebView's more intuitive
API and its seamless integration with Windows Forms led us to choose it over
the Chromium wrapper.

Bumblebee predictably reduced the time we spent on authoring scripts from hours
to a few minutes. Since the code generation rules were written and optimized by
experts in web technologies, the quality of the scripts improved as well.