2

I've implemented Kruskal's algorithm in C++ using the disjoint-set data structure according to Wikipedia like this:

#include <stdio.h>
#include <algorithm>
#define MAX_EDGES 10000000
#define MAX_VERTICES 200001
using namespace std;
int num_edges,num_vertices;
int total_cost=0;

struct edge{
    int v1,v2;
    int cost;
};

struct comp{
    bool operator()(const edge& e1,const edge& e2){
        return e1.cost<e2.cost;
    }
};

edge edges[MAX_EDGES];
int parent[MAX_VERTICES];
int rank[MAX_VERTICES];

int findset(int x){
    if(x!=parent[x]){
        parent[x]=findset(parent[x]);
    }
    return parent[x];
}

void merge(int x,int y){
    int px=findset(x),py=findset(y);
    if(rank[px]>rank[py]){
        parent[py]=px;
    }else{
        parent[px]=py;
    }
    if(rank[px]==rank[py]){
        ++rank[py];
    }
}

int main(){
    FILE* in=fopen("input","r");
    FILE* out=fopen("output","w");
    fscanf(in,"%d %d\n",&num_vertices,&num_edges);
    for(int i=1;i<=num_vertices;++i){
        parent[i]=i;
        rank[i]=0;
    }
    for(int i=0;i<num_edges;++i){
        fscanf(in,"%d %d %d\n",&edges[i].v1,&edges[i].v2,&edges[i].cost);
    }
    sort(edges,edges+num_edges,comp());
    for(int i=0;i<num_edges;++i){
        int s1=findset(edges[i].v1),s2=findset(edges[i].v2);
        if(s1!=s2){
            merge(s1,s2);
            total_cost+=edges[i].cost;
        }
    }
    fprintf(out,"%d\n",total_cost);
}

My question is: Do I need these two lines of code? If so, what's their importance?

  1. int px=findset(x),py=findset(y); in merge instead of int px=parent[x],py=parent[y];
  2. parent[x]=findset(parent[x]); in findset instead of return findset(parent[x]);
templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
Alexandros
  • 3,044
  • 1
  • 23
  • 37

2 Answers2

2

1) findset(x) returns the canonical representative of the set that x is in (the root of its ancestry tree). You need this to be able to compare whether two elements are in the same set or not (they have the same representative), parent[x] just returns the parent of x in the tree, which may not be the root.

1a) You forgot to test for px and py being identical in merge.

2) It's an optimization so that future calls to findset will run faster. If parent[x] used to point to its parent which pointed to the root of its set's tree, after this call parent[x] will point directly to the root.

dfan
  • 5,714
  • 1
  • 31
  • 27
  • The classes representatives, however, have been stored in s1 and s2, therefore there is (probably) no need to call findset again. In addition, according to http://www.topcoder.com/tc?module=Static&d1=tutorials&d2=disjointDataStructure, there is no need to check if px and py are identical. – Alexandros Mar 26 '11 at 12:50
  • If you don't check that they're not identical, you can end up increasing the rank by one for no good reason -- this can mess up the union-by-rank heuristic down the road... – Stuart Golodetz Dec 23 '11 at 11:28
1
  1. You need this to because x.parent is not necessarily the representative of the class to which x belongs, so the algorithm wouldn't be correct without it.
  2. Without the assignment, the algorithm would be suboptimal. This is the path compression optimization, also discussed in the Wikipedia.
Fred Foo
  • 355,277
  • 75
  • 744
  • 836
  • The classes representatives, however, have been stored in s1 and s2, therefore there is (probably) no need to call findset again. – Alexandros Mar 26 '11 at 12:47
  • 1
    @Alexandros: I hadn't seen that. I would stick to the usual formulation of disjoint sets and remove the `findset` calls from `main`. – Fred Foo Mar 26 '11 at 12:52
  • @Alexandros: Skip the `int s1=findset(edges[i].v1), s2=findset(edges[i].v2);` and call `merge` on `edges[i].v1` and `v2` directly. You may also skip the `findset` in `merge`, but the former option is idiomatic and safer. – Fred Foo Mar 26 '11 at 12:57
  • @larsmans: This way, however, I will not know which edges should be added to the MST. – Alexandros Mar 26 '11 at 13:01