summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
-rw-r--r--_log/neo4j-a-star-search.md50
-rw-r--r--assets/css/main.css2
2 files changed, 36 insertions, 16 deletions
diff --git a/_log/neo4j-a-star-search.md b/_log/neo4j-a-star-search.md
index 51d3692..9f4a7d2 100644
--- a/_log/neo4j-a-star-search.md
+++ b/_log/neo4j-a-star-search.md
@@ -1,19 +1,25 @@
---
-title: Neo4J A* optimization
+title: That time I forked Neo4J to implement A* search
date: 2018-03-06
layout: post
---
Written in 2026, backdated to 2018.
-First real performance problem. Vessel tracking with Neo4J hit a wall. Need to
-find shortest paths in a 13,000-vertex graph. Dijkstra's search, which Neo4J
-ships with, slowed to a crawl after 4,000.
+Our vessel tracking system hit a wall. We needed shortest paths through a
+13,000-vertex graph. But the system slowed to a crawl after 4,000. Kristian,
+our CTO, suggested replacing Dijkstra's search with the faster A* search—an
+algorithm I had encountered as an undergraduate.
-Spent better part of the week replacing Dijkstra's algorithm with A*
-search—Python shop, junior developer, took more than a day to set up toolchain
-and build the project. Haversine function as heuristic uses distance between
-ports as the crow flies to steer search:
+The database we used, Neo4J, employed a plug-in system for graph algorithms—a
+mature open-source project developed by experts. Forking it for our unassuming
+start-up was bold. Often, application developers are reluctant to step into
+lower layers.
+
+It took me a week to set up the toolchain, build the project, and run the
+tests. I reviewed Dijkstra's implementation, separating logic from
+infrastructure. I then refactored the algorithm to use the haversine function
+to steer the search:
```
private double computeHeuristic(
@@ -37,7 +43,8 @@ private double computeHeuristic(
}
```
-Core search loop updates costs when better path found:
+And revised the core search loop to update the costs when a better path is
+found:
```
private void updateCosts(
@@ -52,15 +59,28 @@ private void updateCosts(
}
```
-Verdict: 300x speedup—scaled to 13,000 vertices.
+I remember celebrating a 300x speed up, and comfortably scaling to the target
+13,000.
+
+I was even more thrilled when the upstream project accepted my changes, and
+invited me to contribute. In hindsight, I'm a little surprised that my changes
+were accepted in their original form. The heuristic was tightly coupled to our
+use case. Decoupling it would require significant refactoring and interface
+changes.
+
+It was, for a time, the best outcome I could have imagined.
-Upstreamed changes: <a
+My celebration of success, however, was short-lived. In a matter of months, as
+the database grew larger and denser—to 16,000 vertices, if memory serves—A*
+search too began to struggle. We fell back on a far less elegant PostgreSQL
+cache. I learnt my first bitter lesson about optimization: dumb caches beat
+clever algorithms. I briefly wondered if I should have stayed with pure
+mathematics.
+
+Source code: <a
href="https://github.com/neo4j-contrib/neo4j-graph-algorithms/releases/tag/3.4.0.0"
-class="external" target="_blank" rel="noopener noreferrer">Neo4J v3.4.0</a> |
+class="external" target="_blank" rel="noopener noreferrer">Neo4J v3.4.0</a> |
<a
href="https://github.com/neo4j-contrib/neo4j-graph-algorithms/blob/bd9732d9a690319552e134708692acb5a0d6b37c/algo/src/main/java/org/neo4j/graphalgo/impl/ShortestPathAStar.java"
class="external" target="_blank" rel="noopener noreferrer">Full source</a>.
-NOTE: Despite impressive gain, performance horizon visible. Unlikely to scale past
-16,000 vertices without caching.
-
diff --git a/assets/css/main.css b/assets/css/main.css
index 7f330c8..7ef24e5 100644
--- a/assets/css/main.css
+++ b/assets/css/main.css
@@ -28,7 +28,7 @@ header h1 {
pre, code {
font-size: inherit;
- background: #1a1200;
+ background: #222222;
color: #ffb300;
}