0

I want to perform some graph clustering and since I am pretty much bound to Java, decided to give the java package Jung a try. As a simple graph I create two clusters of each 5vertices which are interconnected. I connect both clusters using one edge. I would expect after the graph clustering to retrieve 2 clusters of both size 5 but I get different results. This is the code:

import edu.uci.ics.jung.algorithms.cluster.VoltageClusterer;
import edu.uci.ics.jung.graph.Graph;
import edu.uci.ics.jung.graph.SparseGraph;
import java.io.IOException;
import java.util.Collection;
import java.util.Set;

public class CreateGraph {

public static void main(String[] args) throws IOException {

    // Graph<V, E> where V is the type of the vertices
    // and E is the type of the edges
    Graph<Integer, String> g = new SparseGraph<Integer, String>();

    for (int i = 0; i < 5; i++) {
        g.addVertex((Integer) i);
    }

    for (int i = 0; i < 5; i++) {
        for (int ii = 0; ii < 5; ii++) {
            if (i != ii) {
                g.addEdge("EdgeA-" + i + ii, i, ii);
            }
        }
    }
    // cluster 2
    for (int i = 5; i < 10; i++) {
        g.addVertex((Integer) i);
    }

    for (int i = 5; i < 10; i++) {
        for (int ii = 5; ii < 10; ii++) {
            if (i != ii) {
                g.addEdge("EdgeB-" + i + ii, i, ii);
            }
        }
    }
    System.out.println(g.toString());

    g.addEdge("Edge-connector", 1, 5);

    System.out.println("Creating voltageclusterer");

    VoltageClusterer<Integer, String> vc = new VoltageClusterer<Integer, String>(g, 2);

    Collection<Set<Integer>> clusters = vc.cluster(2);

    for (Set<Integer> s : clusters) {
        System.out.println("set is " + s.size());
        for (Integer ss : s) {
            System.out.println("Element " + ss);
        }
    }
}
 }

and the output: +

  1. set is 1

    • Element 8
  2. set is 9

    • Element 0
    • Element 1
    • Element 2
    • Element 3
    • Element 4
    • Element 5
    • Element 6
    • Element 7
    • Element 9

Anyone any idea? (suggestions with regard to other approaches are also welcome, as long as they are in Java).

helloworld
  • 223
  • 1
  • 7
  • 24

1 Answers1

1

VoltageClusterer has a random element: depending on how the dice roll (and how many times you roll them--see below), sometimes the answer that you get will be quite odd, as it was in this case. You can specify the random seed using setRandomSeed().

The reason you ran into this problem is that the Javadoc for the VoltageClusterer constructor is wrong: the numeric parameter that you're passing in is not the number of clusters; it's the number of random samples that are being generated. Sorry about that; we'll fix it.

You almost certainly want to be using more random samples than 2.

The other clustering algorithms for which JUNG has implementations are deterministic. EdgeBetweennessClusterer in particular will partition the graph as you expect if you tell it to remove one edge.

Joshua O'Madadhain
  • 2,704
  • 1
  • 14
  • 18