diff options
Diffstat (limited to '_site/log/bumblebee/index.html')
| -rw-r--r-- | _site/log/bumblebee/index.html | 64 |
1 files changed, 30 insertions, 34 deletions
diff --git a/_site/log/bumblebee/index.html b/_site/log/bumblebee/index.html index 7fabd41..2aebc2f 100644 --- a/_site/log/bumblebee/index.html +++ b/_site/log/bumblebee/index.html @@ -44,45 +44,41 @@ <h2 class="center" id="title">BUMBLEBEE: BROWSER AUTOMATION</h2> <h6 class="center">02 APRIL 2025</h5> <br> - <div class="twocol justify"><p>Bumblebee is a tool I built for one of my employers to automate the generation -of web scraping scripts.</p> + <div class="twocol justify"><p>Built with Andy Zhang for an employer. Tool to automate web scraping script +generation.</p> <video style="max-width:100%; margin-bottom: 10px" controls="" poster="poster.png"> <source src="bee.mp4" type="video/mp4" /> </video> -<p>In 2024, we were tasked with collecting market data using various methods, -including scraping data from authorized websites for traders’ use.</p> - -<p>Manual authoring of such scripts took time. The scripts were often brittle due -to the complexity of the modern web, and they lacked optimizations such as -bypassing the UI and retrieving the data files directly when possible, which -would have significantly reduced our compute costs.</p> - -<p>To alleviate these challenges, I, with the help of a colleague, Andy Zhang, -built Bumblebee: a web browser powered by C# Windows Forms, Microsoft Edge <a src="https://developer.microsoft.com/en-us/microsoft-edge/webview2" class="external" target="_blank" rel="noopener noreferrer">WebView2</a>, and -the <a src="https://github.com/desjarlais/Scintilla.NET" class="external" toarget="_blank" rel="noopener noreferrer">Scintilla.NET</a> text editor.</p> - -<p>Bumblebee works by injecting a custom JavaScript program that intercepts -client-side events and sends them to Bumblebee for analysis. In addition to -front-end events, Bumblebee also captures internal browser events, which it -then interprets to generate code in real time. Note that we developed Bumblebee -before the advent of now-popular LLMs. Bumblebee supports dynamic websites, -pop-ups, developer tools, live manual override, event debouncing, and filtering -hidden elements and scripts.</p> - -<p>Before settling on a desktop application, we contemplated designing Bumblebee -as a browser extension. We chose the desktop app because extensions don’t offer -the deep, event-based control we needed. Besides, the company’s security -policy, which prohibited browser extensions, would have complicated the -deployment of an extension-based solution. My first prototype used a C# binding -of the Chromium project. WebView’s more intuitive API and its seamless -integration with Windows Forms led us to choose it over the Chromium wrapper.</p> - -<p>What began as a personal side project to improve my own workflow enabled us to -collectively improve the quality of our web scripts at a much larger scale. -Bumblebee predictably reduced the time we spent on authoring scripts from hours -to a few minutes.</p> +<p>Manual script authoring took hours. Scripts poorly optimized, CPUs maxed +constantly, cloud costs excessive.</p> + +<p>Initially considered browser extension. Desktop app won—extensions don’t give +deep event control. Company policy blocked extensions anyway.</p> + +<p>First prototype: C# Win Forms + CefSharp.</p> + +<p>Second prototype: C# Win Forms + WebView2. Packaging and distribution more +complex, but the API is well-designed; integrates well with Win Forms.</p> + +<p>Microsoft Edge required. Portability not a concern, only need to target +controlled Windows environments. Choosing WebView2 over CefSharp.</p> + +<p>Embed <a href="https://github.com/desjarlais/Scintilla.NET" class="external" toarget="_blank" rel="noopener noreferrer">Scintilla.NET</a> editor for +overriding generated script.</p> + +<p>Code generation sequence: Inject JavaScript to intercept client-side events. +Capture internal browser events (pop-ups, file downloads). Event +raised → parsed into a token → insert to list → interpret event → look up +instruction from a table → form instruction with event args → insert text to a +parallel list → run both lists through optimizer → update Scintilla editor.</p> + +<p>Problem: manual overriding via Scintilla editor mid-session causes the code +list to go out of sync with the event list. Optimizer can’t handle this yet.</p> + +<p>Note to self: need to rethink the event/text list data structures in the +context of the optimizer–look to compilers for inspiration maybe?</p> </div> <p class="post-author right">by W. D. Sadeep Madurange</p> |
