diff options
| author | Sadeep Madurange <sadeep@asciimx.com> | 2025-12-07 17:27:22 +0800 |
|---|---|---|
| committer | Sadeep Madurange <sadeep@asciimx.com> | 2025-12-07 17:27:22 +0800 |
| commit | 6101dc4cb4f0e018fad6f067e8f2448a453433b7 (patch) | |
| tree | 149a699dcd5ca49389ea6907c9e2600771c8dbc8 /_projects/bumblebee.md | |
| parent | 2c3d7cd47104017f3858ee79fbd6976046448dab (diff) | |
| download | www-6101dc4cb4f0e018fad6f067e8f2448a453433b7.tar.gz | |
Bumblebee
Diffstat (limited to '_projects/bumblebee.md')
| -rw-r--r-- | _projects/bumblebee.md | 52 |
1 files changed, 38 insertions, 14 deletions
diff --git a/_projects/bumblebee.md b/_projects/bumblebee.md index ef38b2f..a8fa5eb 100644 --- a/_projects/bumblebee.md +++ b/_projects/bumblebee.md @@ -6,23 +6,47 @@ thumbnail: thumb.png layout: post --- -Bumblebee is a web browser that converts browser sessions into C# scripts for -playback. It eliminates the need for authoring browser automation scripts. +Bumblebee is a tool I built for one of my employers to automate the generation +of web scraping scripts. <video style="max-width:100%; margin-bottom: 10px" controls="" poster="thumb.png"> <source src="bee.mp4" type="video/mp4"> </video> -Bumblebee is a Windows Forms application written in C#. Web content is rendered -by the embedded Microsoft Edge browser (via WebView). The text editor on the -right is <a src="https://github.com/desjarlais/Scintilla.NET" class="external" -target="_blank" rel="noopener noreferrer">Scintilla.NET</a>. Users can -override the generated script at any point during the session. The users can -configure Bumblebee to debounce events, ignore hidden elements, etc. - -Bumblebee works by injecting a custom JavaScript program that tracks user -interactions. The tracker intercepts and sends them to the Bumblebee backend as -events for analysis. In addition to the front-end events, Bumblebee also -intercepts events internal to the web browser, which it then interprets to -generate C# code for the Selenium WebDriver in real time. +In 2024, we were tasked with collecting market data using various methods, +including scraping data from authorized websites for traders' use. + +Manual authoring of such scripts took time. The scripts were often brittle due +to the complex nature of modern websites, and they lacked optimizations such as +bypassing the UI and retrieving the data files directly when possible, which +would have significantly reduced our compute costs. + +To alleviate these challenges, I, with the help of a colleague, Andy Zhang, +built Bumblebee: a C# Windows Forms desktop application that uses Microsoft +Edge <a src="https://developer.microsoft.com/en-us/microsoft-edge/webview2" +class="external" target="_blank" rel="noopener noreferrer">WebView2</a> for +rendering web content. + +Bumblebee works by injecting a custom JavaScript program that intercepts +client-side events and sends them to Bumblebee for analysis. In addition to +front-end events, Bumblebee also captures internal browser events, which it +then interprets to generate code in real time. Note that we developed Bumblebee +before the advent of now-popular LLMs. Bumblebee reliably handles dynamic +websites and pop-ups. The user can access developer tools, override any part of +the script at any point during the session (using the embedded <a +src="https://github.com/desjarlais/Scintilla.NET" class="external" +target="_blank" rel="noopener noreferrer">Scintilla.NET</a> editor), debounce +events, and block hidden elements and scripts. + +Before settling on a desktop application, we contemplated a browser extension. +We decided against that because we didn't want the browser vendor to dictate +Bumblebee's capabilities. Furthermore, the company's security policy prohibited +browser extensions, complicating its deployment. The initial prototype used a +C# wrapper of the Chromium project instead of WebView. Its incoherent API +design led us to toss it in favour of WebView, which presented a well-designed +API that interfaced seamlessly with Windows Forms. + +Bumblebee reduced the time we spent on authoring scripts from hours to a few +minutes. Since the rules for code generation were written and optimized by +experts in web technologies, the output was more robust. |
