summaryrefslogtreecommitdiffstats
path: root/_log/bumblebee.md
blob: 74fc06bbb6f843b1c25c2a3a055328e860d9c568 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
---
title: "Bumblebee: browser automation"
date: 2025-04-02
layout: post
project: true
thumbnail: thumb_sm.png
---

Built with Andy Zhang for an employer. Tool to automate web scraping script
generation.

<video style="max-width:100%; margin-bottom: 10px" controls="" poster="poster.png">
  <source src="bee.mp4" type="video/mp4">
</video>

Manual script authoring took hours. Scripts poorly optimized, CPUs maxed
constantly, cloud costs excessive.

Initially considered browser extension. Desktop app won—extensions don't give
deep event control. Company policy blocked extensions anyway.

First prototype: C# Win Forms + CefSharp. 

Second prototype: C# Win Forms + WebView2. Packaging and distribution more
complex, but the API is well-designed; integrates well with Win Forms. 

Microsoft Edge required. Portability not a concern, only need to target
controlled Windows environments. Choosing WebView2 over CefSharp.

Embed <a href="https://github.com/desjarlais/Scintilla.NET" class="external"
toarget="_blank" rel="noopener noreferrer">Scintilla.NET</a> editor for
overriding generated script.

Code generation sequence: Inject JavaScript to intercept client-side events.
Capture internal browser events (pop-ups, file downloads). Event
raised → parsed into a token → insert to list → interpret event → look up
instruction from a table → form instruction with event args → insert text to a
parallel list → run both lists through optimizer → update Scintilla editor.

Problem: manual overriding via Scintilla editor mid-session causes the code
list to go out of sync with the event list. Optimizer can't handle this yet.

Note to self: need to rethink the event/text list data structures in the
context of the optimizer--look to compilers for inspiration maybe?