Algorithms to find nearest nodes in a graph

Question

I have a large road network graph. Suppose I know a specific location (say source node). Now my interest is to find out all the nearest neighbor nodes within a specific range. Say I want to find out all the locations (other nodes) which are in a range of 20 kilometers around the known location / source node.

I know BFS or Dijkastra's algorithm can solve this matter but I feel these are inefficient in my application as my application needs to process these kind of queries again and again.

So is there any other algorithm or technique to accomplish the goal.

Assume this is a weighted graph, nodes represent locations where edges represent distance between two corresponding locations.

Edit: diskastra can solve the problem but what about storing the results? If I store results for all possible pairs after a certain number of queries, what would be the cache size? How to tackle that space inefficiency?

I have also heard about kd-tree, r-tree indexing etc. Are they useful in this context?

Edit: To be more advanced, I am willing to use neo4j graph database for making this graph. I have seen neo4j has a special library called 'neo4j spatial' where R-Tree indexing is used for the purpose but i want to use directed graph concept rather than spatial index library. So is there any way to do this?

You need to be more specific. Why do you feel BFS or Dijkstra's algorithm are inefficient. What time complexity are you targeting? Why is this tagged in `java` and `c++? — arunmoezhi, Mar 16 '14 at 07:28
diskastra can solve the problem but what about storing the results? If I store results for all possible pairs after a certain number of queries, what would be the cache size? How to tackle that space inefficiency? — K.Nath, Mar 16 '14 at 07:56
Since you have a road network, you can represent the nodes with `(latitude, longitude)` or whatever other coordinates you want to use, then throw them into a [KD tree](http://en.wikipedia.org/wiki/K-d_tree), which facilitates fast retrieval. You would end up using something like Euclidean distance instead of arbitrary graph weights, but that's probably an excellent approximation in most cases. The Wiki page has lots of links at the bottom for good implementations of KD trees. — Nicu Stiurca, Mar 16 '14 at 08:08
@ SchighSchagh - I am thinking of representing all the places with some sort of coordinates. But can you tell me about how can I relate Eucledian distance with the weights of the edges? I need a right direction of how to use KD trees and how much efficient they really are in this case? — K.Nath, Mar 16 '14 at 08:12
Convert the 2D coordinates to 3D points on a sphere. Otherwise, you can get issues around the two bands (180 lat and 180 long) where the coordinates wrap around. Such as, -179 and 179 being 358 degrees apart when they are only 2. There are ways around it, but they tend to be problematic and require a bit of extra checks and such. — Nuclearman, Mar 16 '14 at 12:15

Jono · Answer 1 · 2014-03-16T08:00:11.200

4

What you want to use is Dijkstra's algorithm.

It literally is doing exactly what you want - Taking a source node, finding all those with the lowest cost, until that cost reaches a specified size (IE 20km)

I feel these are inefficient in my application as my application needs to process these kind of queries again and again.

Have you thought about caching the results for a given source node? As long as the graph never changes, these will never need to be recalculated.

If your graph is too big, there's also the option of a Hierarchical Graph - It abstracts the graph into portions and pre-processes paths between those portions. The link here refers specifically to A*, but the abstraction it uses can be applied to any search method.

Edit: the Transit Nodes from Mehrdad's answer is a Hierarchical graph using Dijkstra's search specifically.

It's also worthy to consider whether you need a graph at all. If your nodes sit on a linear space and if destination.position - source.position always gives the exact distance, then it's quicker to store them in a list.

edited Mar 16 '14 at 08:00

answered Mar 16 '14 at 07:30

Jono

3,949
4
28
48

2

I see you have given a link for `Hierarchical Graph`. Can you please add two lines to introduce it and why you think it is useful here. That can make your answer more self sufficient. – arunmoezhi Mar 16 '14 at 07:34
i have thought about caching but the problem is: my application is supposed to process thousands of nodes (say more than 50,000 nodes). Query for shortest path can arise for any of the nodes. So if I need to cache all the results, I need to consume a lot memory as shortest path may be there for any possible pair of nodes. Space inefficiency may occur there. That's why i am looking for other algorithms which are more efficient in terms of runtime so that if they are run again and again, they don't affect so much in the overall application. – K.Nath Mar 16 '14 at 07:49
@K.Nath, I'd recommend having a read of Mehrdad's Transit nodes link. They use the USA road network with 24mil nodes with processing times of under a millisecond due to high pre-processing and hierarchy abstraction. – Jono Mar 16 '14 at 08:04

score 2 · Answer 2 · answered Mar 16 '14 at 07:46

2

I don't know a lot about them, but here are some techniques I've heard about that might be helpful:

Here's also a talk on something called "Highway dimension" that can be used to prove time bounds on these techniques.

answered Mar 16 '14 at 07:46

user541686

205,094
128
528
886

Cahit Gungor · Answer 3 · 2014-03-19T08:59:29.980

single source all destinations shortest path with a stopping condition for your initial definition. But you added you call this queries again again. Then the problem becomes All pairs shortest paths graph problem. This is generally solved through 'Dynamic Programming' and Floyd–Warshall algorithm is an example solution with O(V^3) time complexity, O(V^2) space complexity.

Given Floyd-Warshal, the graph should be partitioned according to your range limit but with overlapping regions to get rid of O(V^2) space complexity.

For instance;

200 km area, first centering area is x_1=20 km, y_1=20 km which is a square with sides 40 km. Second square centering x_2=40km, y_2=40km. Square quarters are represented four times. For every partition, Floyd-Warshall algorithm is conducted.This is much better than O(V^2) space complexity for overall calculation taking into account all nodes. According to my calculations, original algorithm requires 2.5B nodes to store related infor, while the proposed one in here requires 1M nodes to store, given that you have 50K nodes.

After creating result matrices, you will have instant access to the nearest nodes within range of the limit.

djhallx · Answer 4 · 2021-07-24T11:52:33.117

Using a graph database approach we can use the infiniteGraph graph database and the DO Query Language. We create a weight calculator and then use it in the query which would look like the following:

CREATE WEIGHT CALCULATOR shortestRoute {
            minimum:    0,
            default:    0, 
            edges: {
                ()-[r:Road]->(): r.distance
            }
};
    
Match m = max weight 8.0 shortestRoute 
            ((:Town {name == 'A'})-[*..10]->(t:Town)) 
            GROUP BY t.name 
            RETURN t.name as Name;

In this query we specify the "max weight" of 8.0 and which WEIGHT CALCUALTOR that we want to use. Then we specify the starting Town {name == 'A'} and the number of degrees out we want to go [*..10]. Then we specify the end-point (t:Town) with no predicate, which is a node of type Town with the label 't'. We group by t.name and return t.name as NAME.

The graph used in this query is:

And the query results are as follows:

DO> Match m = max weight 8.0 shortestRoute ((:Town {name == 'A'})-[*..10]->(t:Town)) GROUP BY t.name RETURN t.name as Name;

{
  _Projection
  {
    Name:'B'
  },
  _Projection
  {
    Name:'D'
  },
  _Projection
  {
    Name:'E'
  },
  _Projection
  {
    Name:'F'
  }
}

The setup (schema and sample data) is as follows:

UPDATE SCHEMA {
    
    CREATE CLASS Town  {
        name                : STRING,
                
        roadsIn             : List { Element: Reference { EdgeClass: Road, EdgeAttribute: from }, CollectionTypeName: TreeListOfReferences },
        roadsOut            : List { Element: Reference { EdgeClass: Road, EdgeAttribute: to }, CollectionTypeName: TreeListOfReferences }      
    }
    
    CREATE CLASS Road {
        name                : String,
    
        from                : Reference { referenced: Town, Inverse: roadsOut },
        to                  : Reference { referenced: Town, Inverse: roadsIn },
                
        distance            : REAL { Storage: B32 },
        avgTravelTime       : REAL { Storage: B32 },
        stopLightCount      : INTEGER { Encoding: Signed, Storage: B16 }
    }
};


let townA = create Town { name: "A" };
let townB = create Town { name: "B" };
let townC = create Town { name: "C" };
let townD = create Town { name: "D" };
let townE = create Town { name: "E" };
let townF = create Town { name: "F" };
let townG = create Town { name: "G" };
let townH = create Town { name: "H" };

let ab = create Road { name: "AB", distance: 4.0, stopLightCount: 1, from: $townA, to: $townB };

let bc = create Road { name: "BC", distance: 5.0, stopLightCount: 2, from: $townB, to: $townC };

let cd = create Road { name: "CD", distance: 6.0, stopLightCount: 3, from: $townC, to: $townD };

let cH = create Road { name: "CH", distance: 10.0, stopLightCount: 0, from: $townC, to: $townH };

let ad = create Road { name: "AD", distance: 3.0, stopLightCount: 0, from: $townA, to: $townD };

let ae = create Road { name: "AE", distance: 4.0, stopLightCount: 3, from: $townA, to: $townE };

let ed = create Road { name: "ED", distance: 2.0, stopLightCount: 1, from: $townE, to: $townD };

let ef = create Road { name: "EF", distance: 4.0, stopLightCount: 7, from: $townE, to: $townF };

let fg = create Road { name: "FG", distance: 3.0, stopLightCount: 0, from: $townF, to: $townG };

let dg = create Road { name: "DG", distance: 8.0, stopLightCount: 6, from: $townD, to: $townG };

let dH = create Road { name: "DH", distance: 8.0, stopLightCount: 9, from: $townD, to: $townH };

let gH = create Road { name: "GH", distance: 8.0, stopLightCount: 4, from: $townG, to: $townH };

The query does fast/early pruning of possible result paths so it's loading only the data that it needs to in order to determine the results.

score -1 · Answer 5 · answered Mar 16 '14 at 07:22

-1

if graph is represented by adjecency matrix or list, you need to scan only one row (for matrix) or list (for adjecency list), so this operation is not that complex.

For graph with n nodes, adjecency matrix say graph[][] will be of size n*n, if source node is s, then simply scan graph[s][i] where i goes from 0 to n-1 and check if graph[s][i]<=DISTANCE

answered Mar 16 '14 at 07:22

Bhavesh Munot

675
1
6
13

1. graphs are usually represented in either of the two data structures you have listed. So what is your point. 2. By scanning only one row you get only the 1 hop neighbours. – arunmoezhi Mar 16 '14 at 07:26
well this is not an acceptable answer. I need to find out all the locations which are within a range. This doesn't mean that I wanna find all the nodes which are directly connected to the source node through one edge only. Distance is the factor here not how many edges are between two nodes. – K.Nath Mar 16 '14 at 07:39

Algorithms to find nearest nodes in a graph

5 Answers5