diff options
Diffstat (limited to '_site/projects/bumblebee/index.html')
| -rw-r--r-- | _site/projects/bumblebee/index.html | 48 |
1 files changed, 35 insertions, 13 deletions
diff --git a/_site/projects/bumblebee/index.html b/_site/projects/bumblebee/index.html index 8219448..576885f 100644 --- a/_site/projects/bumblebee/index.html +++ b/_site/projects/bumblebee/index.html @@ -44,24 +44,46 @@ <h2 class="center" id="title">BUMBLEBEE: BROWSER AUTOMATION</h2> <h6 class="center">02 APRIL 2025</h5> <br> - <div class="twocol justify"><p>Bumblebee is a web browser that converts browser sessions into C# scripts for -playback. It eliminates the need for authoring browser automation scripts.</p> + <div class="twocol justify"><p>Bumblebee is a tool I built for one of my employers to automate the generation +of web scraping scripts.</p> <video style="max-width:100%; margin-bottom: 10px" controls="" poster="thumb.png"> <source src="bee.mp4" type="video/mp4" /> </video> -<p>Bumblebee is a Windows Forms application written in C#. Web content is rendered -by the embedded Microsoft Edge browser (via WebView). The text editor on the -right is <a src="https://github.com/desjarlais/Scintilla.NET" class="external" target="_blank" rel="noopener noreferrer">Scintilla.NET</a>. Users can -override the generated script at any point during the session. The users can -configure Bumblebee to debounce events, ignore hidden elements, etc.</p> - -<p>Bumblebee works by injecting a custom JavaScript program that tracks user -interactions. The tracker intercepts and sends them to the Bumblebee backend as -events for analysis. In addition to the front-end events, Bumblebee also -intercepts events internal to the web browser, which it then interprets to -generate C# code for the Selenium WebDriver in real time.</p> +<p>In 2024, we were tasked with collecting market data using various methods, +including scraping data from authorized websites for traders’ use.</p> + +<p>Manual authoring of such scripts took time. The scripts were often brittle due +to the complex nature of modern websites, and they lacked optimizations such as +bypassing the UI and retrieving the data files directly when possible, which +would have significantly reduced our compute costs.</p> + +<p>To alleviate these challenges, I, with the help of a colleague, Andy Zhang, +built Bumblebee: a C# Windows Forms desktop application that uses Microsoft +Edge <a src="https://developer.microsoft.com/en-us/microsoft-edge/webview2" class="external" target="_blank" rel="noopener noreferrer">WebView2</a> for +rendering web content.</p> + +<p>Bumblebee works by injecting a custom JavaScript program that intercepts +client-side events and sends them to Bumblebee for analysis. In addition to +front-end events, Bumblebee also captures internal browser events, which it +then interprets to generate code in real time. Note that we developed Bumblebee +before the advent of now-popular LLMs. Bumblebee reliably handles dynamic +websites and pop-ups. The user can access developer tools, override any part of +the script at any point during the session (using the embedded <a src="https://github.com/desjarlais/Scintilla.NET" class="external" target="_blank" rel="noopener noreferrer">Scintilla.NET</a> editor), debounce +events, and block hidden elements and scripts.</p> + +<p>Before settling on a desktop application, we contemplated a browser extension. +We decided against that because we didn’t want the browser vendor to dictate +Bumblebee’s capabilities. Furthermore, the company’s security policy prohibited +browser extensions, complicating its deployment. The initial prototype used a +C# wrapper of the Chromium project instead of WebView. Its incoherent API +design led us to toss it in favour of WebView, which presented a well-designed +API that interfaced seamlessly with Windows Forms.</p> + +<p>Bumblebee reduced the time we spent on authoring scripts from hours to a few +minutes. Since the rules for code generation were written and optimized by +experts in web technologies, the output was more robust.</p> </div> <p class="post-author right">by Wickramage Don Sadeep Madurange</p> |
