diff options
| author | Sadeep Madurange <sadeep@asciimx.com> | 2026-04-21 22:26:02 +0800 |
|---|---|---|
| committer | Sadeep Madurange <sadeep@asciimx.com> | 2026-04-22 15:03:46 +0800 |
| commit | 37dca88045810de3fb123d8df083eeb488dc1f86 (patch) | |
| tree | d4303325fd8dc8bde874b59ea09857c53339c838 | |
| parent | 22927ad6cd53441c8cdfca677bf1297156f6cccf (diff) | |
| download | www-37dca88045810de3fb123d8df083eeb488dc1f86.tar.gz | |
Neo4J post written in a more narrative form.
| -rw-r--r-- | _log/neo4j-a-star-search.md | 50 | ||||
| -rw-r--r-- | assets/css/main.css | 2 |
2 files changed, 36 insertions, 16 deletions
diff --git a/_log/neo4j-a-star-search.md b/_log/neo4j-a-star-search.md index 51d3692..9f4a7d2 100644 --- a/_log/neo4j-a-star-search.md +++ b/_log/neo4j-a-star-search.md @@ -1,19 +1,25 @@ --- -title: Neo4J A* optimization +title: That time I forked Neo4J to implement A* search date: 2018-03-06 layout: post --- Written in 2026, backdated to 2018. -First real performance problem. Vessel tracking with Neo4J hit a wall. Need to -find shortest paths in a 13,000-vertex graph. Dijkstra's search, which Neo4J -ships with, slowed to a crawl after 4,000. +Our vessel tracking system hit a wall. We needed shortest paths through a +13,000-vertex graph. But the system slowed to a crawl after 4,000. Kristian, +our CTO, suggested replacing Dijkstra's search with the faster A* search—an +algorithm I had encountered as an undergraduate. -Spent better part of the week replacing Dijkstra's algorithm with A* -search—Python shop, junior developer, took more than a day to set up toolchain -and build the project. Haversine function as heuristic uses distance between -ports as the crow flies to steer search: +The database we used, Neo4J, employed a plug-in system for graph algorithms—a +mature open-source project developed by experts. Forking it for our unassuming +start-up was bold. Often, application developers are reluctant to step into +lower layers. + +It took me a week to set up the toolchain, build the project, and run the +tests. I reviewed Dijkstra's implementation, separating logic from +infrastructure. I then refactored the algorithm to use the haversine function +to steer the search: ``` private double computeHeuristic( @@ -37,7 +43,8 @@ private double computeHeuristic( } ``` -Core search loop updates costs when better path found: +And revised the core search loop to update the costs when a better path is +found: ``` private void updateCosts( @@ -52,15 +59,28 @@ private void updateCosts( } ``` -Verdict: 300x speedup—scaled to 13,000 vertices. +I remember celebrating a 300x speed up, and comfortably scaling to the target +13,000. + +I was even more thrilled when the upstream project accepted my changes, and +invited me to contribute. In hindsight, I'm a little surprised that my changes +were accepted in their original form. The heuristic was tightly coupled to our +use case. Decoupling it would require significant refactoring and interface +changes. + +It was, for a time, the best outcome I could have imagined. -Upstreamed changes: <a +My celebration of success, however, was short-lived. In a matter of months, as +the database grew larger and denser—to 16,000 vertices, if memory serves—A* +search too began to struggle. We fell back on a far less elegant PostgreSQL +cache. I learnt my first bitter lesson about optimization: dumb caches beat +clever algorithms. I briefly wondered if I should have stayed with pure +mathematics. + +Source code: <a href="https://github.com/neo4j-contrib/neo4j-graph-algorithms/releases/tag/3.4.0.0" -class="external" target="_blank" rel="noopener noreferrer">Neo4J v3.4.0</a> | +class="external" target="_blank" rel="noopener noreferrer">Neo4J v3.4.0</a> | <a href="https://github.com/neo4j-contrib/neo4j-graph-algorithms/blob/bd9732d9a690319552e134708692acb5a0d6b37c/algo/src/main/java/org/neo4j/graphalgo/impl/ShortestPathAStar.java" class="external" target="_blank" rel="noopener noreferrer">Full source</a>. -NOTE: Despite impressive gain, performance horizon visible. Unlikely to scale past -16,000 vertices without caching. - diff --git a/assets/css/main.css b/assets/css/main.css index 7f330c8..7ef24e5 100644 --- a/assets/css/main.css +++ b/assets/css/main.css @@ -28,7 +28,7 @@ header h1 { pre, code { font-size: inherit; - background: #1a1200; + background: #222222; color: #ffb300; } |
