11

I've been playing around with some things and thought up the idea of trying to figure out Kevin Bacon numbers. I have data for a site that for this purpose we can consider a social network. Let's pretend that it's Facebook (for simplification of discussion). I have people and I have a list of their friends, so I have the connections between them. How can I calculate the distance from one person to another (basically, a Kevin Bacon number)?

My best idea is a Bidirectional search, with a depth limit (to limit computational complexity and avoid the problem of people who simply can't be connected in the graph), but I realize this is rather brute force.

Could it be better to make little sub-graphs (say something equivalent to groups on Facebook), calculate the shortest distances between them (ahead of time, perhaps) and then try to use THOSE to find a link? While this requires pre-calculation, it could make it possible to search many fewer nodes (nodes could be groups instead of individuals, making the graph much smaller). This would still be a bidirectional search though.

I could also pre-calculate the number of people an individual is connected to, searching the nodes for "popular" people first since they could have the best chance of connecting to the given destination individual. I realize this would be a trade-off of speed for possible shortest path. I'd think I'd also want to use a depth-first search instead of the breadth-first search I'd plan to use in the other cases.

Can someone think of a simpler/faster way of doing this? I'd like to be able to find the shortest length between two people, so it's not as easy as always having the same end point (such as in the Kevin Bacon problem).

I realize that there are problems like I could get chains of 200 people and such, but that can be solved my having a limit to the depth I'm willing to search.

Deduplicator
  • 44,692
  • 7
  • 66
  • 118
MBCook
  • 14,424
  • 7
  • 37
  • 41
  • BTW, since this is not about movies, there is no compelling reason to call it a Kevin Bacon number rather than the more familiar (to some ;-)) Erdős number: http://en.wikipedia.org/wiki/Erdos_number – ShreevatsaR Mar 24 '09 at 01:37
  • 1
    I saw that term while doing some research, but by calling it a Kevin Bacon number everyone instantly knows what I'm talking about. I figured that would cut down on the explaining. – MBCook Mar 24 '09 at 14:12
  • "degrees of separation" would also make sense – Steven A. Lowe Mar 26 '09 at 00:06

4 Answers4

17

This is a standard shortest path problem. There are lots of solutions, including Dijkstra's algorithm and Bellman-Ford. You may be particularly interested in looking at the A* algorithm and seeing how it would perform with the cost function relative to the inverse of any particular node's degree. The idea would be to visit more popular nodes (those with higher degree) first.

tvanfosson
  • 524,688
  • 99
  • 697
  • 795
  • 1
    +1 As I mentioned after thinking about things for a couple minutes, Dijkstra's and Bellman-Ford will both reduce into a simple breadth-first search when the edge weights are all 1. A* is worth a look, since it adds the heuristic. Combined with a limited depth, it may be the best you can get. – Adam Jaskiewicz Mar 23 '09 at 20:44
  • A* is probably the worst of the three for this type of search because it returns only the node closest to the heuristic, while Dijkstra's algorithm returns any of the closest nodes (the first one it finds). And might thus be done sooner because you're not looking for anything specific. – Jasper Bekkers Mar 23 '09 at 21:27
  • 1
    @Jasper -- the intuition would be that shortest paths tend to go through well-connected nodes -- this would be the hypothesis to test. If true, the heuristic would give you the shortest path sooner leading you to be able to terminate other (non-shortest) potential paths earlier. – tvanfosson Mar 23 '09 at 21:42
  • @tvanfosson: using the degrees of the vertices sounds like a good idea, but A* can only find one path to a node. You can't say "give me a path from here to some node that has a high degree" because now you're looking for a group of nodes. Anyway, this is probably something to benchmark. – Jasper Bekkers Mar 24 '09 at 10:42
4

Sounds like a job for Dijkstra's algorithm.

ED: Eh, I shouldn't have pulled the trigger so fast. Dijkstra's (and Bellman-Ford) reduces to a breadth-first search when the weights are 1, so this isn't too useful. Oh well.

The A* algorithm, mentioned by tvanfosson, may be ideal for this. The idea is that instead of searching and recursing in whatever order the elements are in each level of the tree (rooted on your start- or end-point), you use some heuristic to determine which element you are going to try first. In your case a good bet would probably be the degree of a node (number of "friends"), but you could possibly want to use the number of people within some arbitrary number of degrees of a given person (i.e., the guy who has has three friends who each have 100 friends is likely to be a better node than the guy who has 20 friends in a clique that shuns outsiders). There's all sorts of other things you could use as a heuristic (friends get 2 points, friends-of-friends get 1 point; whatever, experiment).

Combine this with a depth limit (cut off after 6 degrees of separation, or whatever), and you can vastly improve your average case (worst case is still the same as basic BFS).

Adam Jaskiewicz
  • 10,934
  • 3
  • 34
  • 37
  • Nothing's wrong with it. If you want to limit the depth to, say, 6 degrees of separation, though, it makes sense to also use some sort of heuristic to determine which node to look at next in your breadth-first search (i.e. A*). – Adam Jaskiewicz Mar 23 '09 at 20:49
  • It won't improve worst-case, but it could improve average-case. Yes, it's still BFS, but "BFS" doesn't tell the whole story. – Adam Jaskiewicz Mar 23 '09 at 20:50
  • Mainly what I meant about it "not being useful" is that bog-standard BFS had already been mentioned, and I wasn't contributing anything new by suggesting an algorithm that is more general, but reduces to the same thing in this case. I've added more ideas to my answer to hopefully make it better. – Adam Jaskiewicz Mar 23 '09 at 21:29
0

run a breadth-first search in both directions (from each endpoint) and stop when you have a connection or reach your depth limit

Steven A. Lowe
  • 60,273
  • 18
  • 132
  • 202
0

This one might be better overall Floyd-Warshall the all pairs shortest distance.

sfossen
  • 4,774
  • 24
  • 18