From 37dca88045810de3fb123d8df083eeb488dc1f86 Mon Sep 17 00:00:00 2001 From: Sadeep Madurange Date: Tue, 21 Apr 2026 22:26:02 +0800 Subject: Neo4J post written in a more narrative form. --- _log/neo4j-a-star-search.md | 50 +++++++++++++++++++++++++++++++-------------- 1 file changed, 35 insertions(+), 15 deletions(-) (limited to '_log/neo4j-a-star-search.md') diff --git a/_log/neo4j-a-star-search.md b/_log/neo4j-a-star-search.md index 51d3692..9f4a7d2 100644 --- a/_log/neo4j-a-star-search.md +++ b/_log/neo4j-a-star-search.md @@ -1,19 +1,25 @@ --- -title: Neo4J A* optimization +title: That time I forked Neo4J to implement A* search date: 2018-03-06 layout: post --- Written in 2026, backdated to 2018. -First real performance problem. Vessel tracking with Neo4J hit a wall. Need to -find shortest paths in a 13,000-vertex graph. Dijkstra's search, which Neo4J -ships with, slowed to a crawl after 4,000. +Our vessel tracking system hit a wall. We needed shortest paths through a +13,000-vertex graph. But the system slowed to a crawl after 4,000. Kristian, +our CTO, suggested replacing Dijkstra's search with the faster A* search—an +algorithm I had encountered as an undergraduate. -Spent better part of the week replacing Dijkstra's algorithm with A* -search—Python shop, junior developer, took more than a day to set up toolchain -and build the project. Haversine function as heuristic uses distance between -ports as the crow flies to steer search: +The database we used, Neo4J, employed a plug-in system for graph algorithms—a +mature open-source project developed by experts. Forking it for our unassuming +start-up was bold. Often, application developers are reluctant to step into +lower layers. + +It took me a week to set up the toolchain, build the project, and run the +tests. I reviewed Dijkstra's implementation, separating logic from +infrastructure. I then refactored the algorithm to use the haversine function +to steer the search: ``` private double computeHeuristic( @@ -37,7 +43,8 @@ private double computeHeuristic( } ``` -Core search loop updates costs when better path found: +And revised the core search loop to update the costs when a better path is +found: ``` private void updateCosts( @@ -52,15 +59,28 @@ private void updateCosts( } ``` -Verdict: 300x speedup—scaled to 13,000 vertices. +I remember celebrating a 300x speed up, and comfortably scaling to the target +13,000. + +I was even more thrilled when the upstream project accepted my changes, and +invited me to contribute. In hindsight, I'm a little surprised that my changes +were accepted in their original form. The heuristic was tightly coupled to our +use case. Decoupling it would require significant refactoring and interface +changes. + +It was, for a time, the best outcome I could have imagined. -Upstreamed changes: Neo4J v3.4.0 | +class="external" target="_blank" rel="noopener noreferrer">Neo4J v3.4.0 | Full source. -NOTE: Despite impressive gain, performance horizon visible. Unlikely to scale past -16,000 vertices without caching. - -- cgit v1.2.3