summaryrefslogtreecommitdiffstats
path: root/_site/projects
diff options
context:
space:
mode:
Diffstat (limited to '_site/projects')
-rw-r--r--_site/projects/bumblebee/index.html48
1 files changed, 35 insertions, 13 deletions
diff --git a/_site/projects/bumblebee/index.html b/_site/projects/bumblebee/index.html
index 8219448..576885f 100644
--- a/_site/projects/bumblebee/index.html
+++ b/_site/projects/bumblebee/index.html
@@ -44,24 +44,46 @@
<h2 class="center" id="title">BUMBLEBEE: BROWSER AUTOMATION</h2>
<h6 class="center">02 APRIL 2025</h5>
<br>
- <div class="twocol justify"><p>Bumblebee is a web browser that converts browser sessions into C# scripts for
-playback. It eliminates the need for authoring browser automation scripts.</p>
+ <div class="twocol justify"><p>Bumblebee is a tool I built for one of my employers to automate the generation
+of web scraping scripts.</p>
<video style="max-width:100%; margin-bottom: 10px" controls="" poster="thumb.png">
<source src="bee.mp4" type="video/mp4" />
</video>
-<p>Bumblebee is a Windows Forms application written in C#. Web content is rendered
-by the embedded Microsoft Edge browser (via WebView). The text editor on the
-right is <a src="https://github.com/desjarlais/Scintilla.NET" class="external" target="_blank" rel="noopener noreferrer">Scintilla.NET</a>. Users can
-override the generated script at any point during the session. The users can
-configure Bumblebee to debounce events, ignore hidden elements, etc.</p>
-
-<p>Bumblebee works by injecting a custom JavaScript program that tracks user
-interactions. The tracker intercepts and sends them to the Bumblebee backend as
-events for analysis. In addition to the front-end events, Bumblebee also
-intercepts events internal to the web browser, which it then interprets to
-generate C# code for the Selenium WebDriver in real time.</p>
+<p>In 2024, we were tasked with collecting market data using various methods,
+including scraping data from authorized websites for traders’ use.</p>
+
+<p>Manual authoring of such scripts took time. The scripts were often brittle due
+to the complex nature of modern websites, and they lacked optimizations such as
+bypassing the UI and retrieving the data files directly when possible, which
+would have significantly reduced our compute costs.</p>
+
+<p>To alleviate these challenges, I, with the help of a colleague, Andy Zhang,
+built Bumblebee: a C# Windows Forms desktop application that uses Microsoft
+Edge <a src="https://developer.microsoft.com/en-us/microsoft-edge/webview2" class="external" target="_blank" rel="noopener noreferrer">WebView2</a> for
+rendering web content.</p>
+
+<p>Bumblebee works by injecting a custom JavaScript program that intercepts
+client-side events and sends them to Bumblebee for analysis. In addition to
+front-end events, Bumblebee also captures internal browser events, which it
+then interprets to generate code in real time. Note that we developed Bumblebee
+before the advent of now-popular LLMs. Bumblebee reliably handles dynamic
+websites and pop-ups. The user can access developer tools, override any part of
+the script at any point during the session (using the embedded <a src="https://github.com/desjarlais/Scintilla.NET" class="external" target="_blank" rel="noopener noreferrer">Scintilla.NET</a> editor), debounce
+events, and block hidden elements and scripts.</p>
+
+<p>Before settling on a desktop application, we contemplated a browser extension.
+We decided against that because we didn’t want the browser vendor to dictate
+Bumblebee’s capabilities. Furthermore, the company’s security policy prohibited
+browser extensions, complicating its deployment. The initial prototype used a
+C# wrapper of the Chromium project instead of WebView. Its incoherent API
+design led us to toss it in favour of WebView, which presented a well-designed
+API that interfaced seamlessly with Windows Forms.</p>
+
+<p>Bumblebee reduced the time we spent on authoring scripts from hours to a few
+minutes. Since the rules for code generation were written and optimized by
+experts in web technologies, the output was more robust.</p>
</div>
<p class="post-author right">by Wickramage Don Sadeep Madurange</p>