summaryrefslogtreecommitdiffstats
path: root/_log/bumblebee.md
blob: 21a9a07147dcb5cddf723e2d66fb5e2a27d996cd (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
---
title: Built a browser session script synthesizer
date: 2025-04-02
layout: post
project: true
thumbnail: thumb_sm.png
---

One year at trading firm. Webscraper is giving too many problems. CPUs are
saturated, servers are stalling.

2025-02: Built Bumblebee, a C# WinForms application, to record browser sessions
and automate the synthesis of scripts.

<video style="max-width:100%; margin-bottom: 10px" controls="" poster="poster.png">
  <source src="bee.mp4" type="video/mp4">
</video>

Hosted WebView2 (Edge) in the WinForms application to render web content.

Intercepted events by injecting JS hooks to web pages (client-side events) and
listening to WebView events (internal browser events). Converted intercepted
events to Selenium code by sending through if-else statements. Crude—no time
for something better.

Implemented a basic optimizer to squash event sequences into single commands
(e.g., calendar clicks → text input), use heuristics to improve DOM addressing
(xpath, id, element).

Integrated Scintilla.NET editor to allow user more control over the generated
script.

Events and code are stored in two linear lists. Mid-session manual edits desync
the lists, block the optimizer. ASTs are overkill for now. As a workaround,
only edit scripts at the end of recording.

2025-03: Shipped the first iteration and began work on key optimization: bypass
the browser, grab data files directly when possible.

2025-04: Abandoned project. Left the firm.