summaryrefslogtreecommitdiffstats
path: root/_log/neo4j-a-star-search.md
blob: 9f4a7d20f5ab5a4b84c6ee6c9230bcad065b6149 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
---
title: That time I forked Neo4J to implement A* search
date: 2018-03-06
layout: post
---

Written in 2026, backdated to 2018. 

Our vessel tracking system hit a wall. We needed shortest paths through a
13,000-vertex graph. But the system slowed to a crawl after 4,000.  Kristian,
our CTO, suggested replacing Dijkstra's search with the faster A* search—an
algorithm I had encountered as an undergraduate.

The database we used, Neo4J, employed a plug-in system for graph algorithms—a
mature open-source project developed by experts. Forking it for our unassuming
start-up was bold. Often, application developers are reluctant to step into
lower layers.

It took me a week to set up the toolchain, build the project, and run the
tests. I reviewed Dijkstra's implementation, separating logic from
infrastructure. I then refactored the algorithm to use the haversine function
to steer the search:

```
private double computeHeuristic(
    final double lat1, final double lon1,
    final double lat2, final double lon2) {
    final int earthRadius = 6371;
    final double kmToNM = 0.539957;

    final double latDistance = Math.toRadians(lat2 - lat1);
    final double lonDistance = Math.toRadians(lon2 - lon1);

    final double a = Math.sin(latDistance / 2) 
        * Math.sin(latDistance / 2)
        + Math.cos(Math.toRadians(lat1)) 
        * Math.cos(Math.toRadians(lat2))
        * Math.sin(lonDistance / 2) 
        * Math.sin(lonDistance / 2);

    final double c = 2 * Math.atan2(Math.sqrt(a), Math.sqrt(1 - a));
    return earthRadius * c * kmToNM;
}
```

And revised the core search loop to update the costs when a better path is
found:

```
private void updateCosts(
    final int source, final int target,
    final double newCost, final double heuristic) {
    final double oldCost = gCosts.getOrDefault(target, Double.MAX_VALUE);
    if (newCost < oldCost) {
        gCosts.put(target, newCost);
        fCosts.put(target, newCost + heuristic);
        path.put(target, source);
    }
}
```

I remember celebrating a 300x speed up, and comfortably scaling to the target
13,000. 

I was even more thrilled when the upstream project accepted my changes, and
invited me to contribute. In hindsight, I'm a little surprised that my changes
were accepted in their original form. The heuristic was tightly coupled to our
use case. Decoupling it would require significant refactoring and interface
changes.

It was, for a time, the best outcome I could have imagined.

My celebration of success, however, was short-lived. In a matter of months, as
the database grew larger and denser—to 16,000 vertices, if memory serves—A*
search too began to struggle. We fell back on a far less elegant PostgreSQL
cache. I learnt my first bitter lesson about optimization: dumb caches beat
clever algorithms. I briefly wondered if I should have stayed with pure
mathematics.

Source code: <a
href="https://github.com/neo4j-contrib/neo4j-graph-algorithms/releases/tag/3.4.0.0"
class="external" target="_blank" rel="noopener noreferrer">Neo4J v3.4.0</a> |
<a
href="https://github.com/neo4j-contrib/neo4j-graph-algorithms/blob/bd9732d9a690319552e134708692acb5a0d6b37c/algo/src/main/java/org/neo4j/graphalgo/impl/ShortestPathAStar.java"
class="external" target="_blank" rel="noopener noreferrer">Full source</a>.