0

Suppose you have an input file:

<total vertices>
<x-coordinate 1st location><y-coordinate 1st location>
<x-coordinate 2nd location><y-coordinate 2nd location>
<x-coordinate 3rd location><y-coordinate 3rd location>
...

How can Prim's algorithm be used to find the MST for these locations? I understand this problem is typically solved using an adjacency matrix. Any references would be great if applicable.

Bob John
  • 3,688
  • 14
  • 43
  • 57
  • 1
    I don't understand this question. You have your problem and you know of an algorithm that solves it. Do you not understand how Prim's works? Are you not sure how to implement Prim's? Do you not understand how Prim's helps you solve this problem? Does this help? http://en.wikipedia.org/wiki/Prim's_algorithm – rliu Apr 14 '13 at 06:26
  • I guess that this implies a complete graph (one where every vertex is connected to every other vertex). Is that the part that you're missing for understanding the question? Also, since it's talking about coordinates, I guess that the weight of each edge is the euklidean distance (use hypot() function). – Ulrich Eckhardt Apr 14 '13 at 10:57
  • That's a good point. The question doesn't specify which nodes are connected... I think I assumed that there was a section called "edges" where it listed pairs of points. But it's certainly possible that it's a complete graph or something else. – rliu Apr 14 '13 at 11:08

2 Answers2

0

If you already know prim, it is easy. Create adjacency matrix adj[i][j] = distance between location i and location j

marcadian
  • 2,608
  • 13
  • 20
  • I'm just slightly confused, as the locations can be given in an arbitrary order. – Bob John Apr 14 '13 at 07:00
  • can you tell why do you think the order would make any difference? – marcadian Apr 14 '13 at 07:52
  • Because how can I determine which locations are adjacent to each other? Do I have to search every location after every iteration? – Bob John Apr 14 '13 at 08:43
  • @BobJohn That every location is connected to every other location and this connection has weight = the distance is probably an assumption that should be made, but this really should have been given in the assignment (if this was an assignment). Thus, as the question stands, this answer is most likely correct. – Bernhard Barker Apr 14 '13 at 10:58
  • If the graph is complete, is there any way to do this problem besides calculating the distances between every pair of points? – Bob John Apr 14 '13 at 20:41
0

I'm just going to describe some implementations of Prim's and hopefully that gets you somewhere.

First off, your question doesn't specify how edges are input to the program. You have a total number of vertices and the locations of those vertices. How do you know which ones are connected?

Assuming you have the edges (and the weights of those edges. Like @doomster said above, it may be the planar distance between the points since they are coordinates), we can start thinking about our implementation. Wikipedia describes three different data structures that result in three different run times: http://en.wikipedia.org/wiki/Prim's_algorithm#Time_complexity

The simplest is the adjacency matrix. As you might guess from the name, the matrix describes nodes that are "adjacent". To be precise, there are |v| rows and columns (where |v| is the number of vertices). The value at adjacencyMatrix[i][j] varies depending on the usage. In our case it's the weight of the edge (i.e. the distance) between node i and j (this means that you need to index the vertices in some way. For instance, you might add the vertices to a list and use their position in the list).

Now using this adjacency matrix our algorithm is as follows:

  1. Create a dictionary which contains all of the vertices and is keyed by "distance". Initially the distance of all of the nodes is infinity.
  2. Create another dictionary to keep track of "parents". We use this to generate the MST. It's more natural to keep track of edges, but it's actually easier to implement by keeping track of "parents". Note that if you root a tree (i.e. designate some node as the root), then every node (other than the root) has precisely one parent. So by producing this dictionary of parents we'll have our MST!
  3. Create a new list with a randomly chosen node v from the original list.
    1. Remove v from the distance dictionary and add it to the parent dictionary with a null as its parent (i.e. it's the "root").
    2. Go through the row in the adjacency matrix for that node. For any node w that is connected (for non-connected nodes you have to set their adjacency matrix value to some special value. 0, -1, int max, etc.) update its "distance" in the dictionary to adjacencyMatrix[v][w]. The idea is that it's not "infinitely far away" anymore... we know we can get there from v.
  4. While the dictionary is not empty (i.e. while there are nodes we still need to connect to)
    1. Look over the dictionary and find the vertex with the smallest distance x
    2. Add it to our new list of vertices
    3. For each of its neighbors, update their distance to min(adjacencyMatrix[x][neighbor], distance[neighbor]) and also update their parent to x. Basically, if there is a faster way to get to neighbor then the distance dictionary should be updated to reflect that; and if we then add neighbor to the new list we know which edge we actually added (because the parent dictionary says that its parent was x).
  5. We're done. Output the MST however you want (everything you need is contained in the parents dictionary)

I admit there is a bit of a leap from the wikipedia page to the actual implementation as outlined above. I think the best way to approach this gap is to just brute force the code. By that I mean, if the pseudocode says "find the min [blah] such that [foo] is true" then write whatever code you need to perform that, and stick it in a separate method. It'll definitely be inefficient, but it'll be a valid implementation. The issue with graph algorithms is that there are 30 ways to implement them and they are all very different in performance; the wikipedia page can only describe the algorithm conceptually. The good thing is that once you implement it some way, you can find optimizations quickly ("oh, if I keep track of this state in this separate data structure, I can make this lookup way faster!"). By the way, the runtime of this is O(|V|^2). I'm too lazy to detail that analysis, but loosely it's because:

  1. All initialization is O(|V|) at worse
  2. We do the loop O(|V|) times and take O(|V|) time to look over the dictionary to find the minimum node. So basically the total time to find the minimum node multiple times is O(|V|^2).
  3. The time it takes to update the distance dictionary is O(|E|) because we only process each edge once. Since |E| is O(|V|^2) this is also O(|V|^2)
  4. Keeping track of the parents is O(|V|)
  5. Outputting the tree is O(|V| + |E|) = O(|E|) at worst
  6. Adding all of these (none of them should be multiplied except within (2)) we get O(|V|^2)

The implementation with a heap is O(|E|log(|V|) and it's very very similar to the above. The only difference is that updating the distance is O(log|V|) instead of O(1) (because it's a heap), BUT finding/removing the min element is O(log|V|) instead of O(|V|) (because it's a heap). The time complexity is quite similar in analysis and you end up with something like O(|V|log|V| + |E|log|V|) = O(|E|log|V|) as desired.

Actually... I'm a bit confused why the adjacency matrix implementation cares about it being an adjacency matrix. It could just as well be implemented using an adjacency list. I think the key part is how you store the distances. I could be way off in my implementation outlined above, but I am pretty sure it implements Prim's algorithm is satisfies the time complexity constraints outlined by wikipedia.

rliu
  • 1,148
  • 6
  • 8
  • If the graph is complete, is there any way to do this problem besides calculating the distances between every pair of points? – Bob John Apr 15 '13 at 20:10
  • Conceptually Prim's doesn't care about the distance between a pair of points... ever. Prim's only ever considers the distance of an adjacent node to the "tree so far" (i.e. a node that is connected by an edge to the "tree so far"). If your question is actually "is it possible to find the MST without looking at every edge?" then the answer is no. And the proof is pretty trivial (how can you know your MST is minimal if you ignored an edge?) – rliu Apr 16 '13 at 02:22