Finding cliques or strongly connected components in Apache Spark using Graphx

Question

A clique, C, in an undirected graph G = (V, E) is a subset of the vertices, C ⊆ V, such that every two distinct vertices are adjacent. This is equivalent to the condition that the subgraph of G induced by C is complete. In some cases, the term clique may also refer to the subgraph directly.

So, I am using GraphX with Apache-Spark. I read its documentation guide, and they provide a way to find out connected components in a graph, but not the cliques/strongly connected components. How can I do that using Scala? Thanks!

Edit: As suggested in comments, a piece of code that I wrote in R for doing the same task is as follows: (The problem in using this code with Spark is that the recently released SparkR through which I can use R with Spark has limited support in terms of libraries (for example, igraph). Therefore, I started using GraphX and Scala) in which I now need the algorithm.

library(igraph)
files <- paste0("NP",1:10,".txt") // Files which represent graphs
func.clique <- function(file)
{
    w <- read.table(file)
    g <- graph.edgelist(cbind(as.character(w$V1),as.character(w$V2)))
    plot(g)
    cli <- cliques(g)
    return (cli)
}
cliquevalues <- sapply(files,func.clique)

Well you have to write the algorithm that does that! What have you tried so far? — eliasah, Jul 04 '15 at 15:19
@elisah, the thing is, I know how to find it using R. Hell, I even wrote the proper code for it. I tried importing it to Spark with their recently released library SparkR. However, the only problem is, since it's fairly new, they have extremely limited support in terms of libraries (for example, `igraph`). I am fairly new to Scala and I am learning it right now. So, don't really know the appropriate algorithm for this. Thanks! — John Lui, Jul 04 '15 at 16:24
Then update your question with what you have tried so you can get some help with the code migration! — eliasah, Jul 04 '15 at 16:25
GraphX only has cliques for size 3 (called count [triangles](https://spark.apache.org/docs/latest/graphx-programming-guide.html#triangle-counting)). Currently, it doesn't have an algorithm for findings cliques of arbitrary size. — marios, Jul 04 '15 at 16:50
@marios, then, how should I go about finding cliques using Spark (Scala) for finding cliques of arbitrary size? I have therefore tried, SparkR and GraphX. What other option do I have now? Thanks! — John Lui, Jul 04 '15 at 18:00
You can use any Java/Scala graph library that supports clique detection (take a look [here](http://jgrapht.org/javadoc/org/jgrapht/alg/BronKerboschCliqueFinder.html) for example). However, this will not be any faster than what you get from igraph in R. — marios, Jul 05 '15 at 16:41
@marios, much thanks for the help. But, how exactly do I use this library? And why won't it be any faster? Thanks! — John Lui, Jul 05 '15 at 17:24
It will not be distributed/parallelized computation. To achieve that you need to reimplement the algorithm in Spark's api and run it on a cluster. To use the suggested library just include the library's jar in your project. — marios, Jul 05 '15 at 17:44

score 2 · Answer 1 · answered Jul 06 '15 at 11:38

We've recently used jgrapht, the same as mentioned by @marios above in comment. Sample code on how to use it, here Vertex is the custom Vertex class and cliques give you list of all cliques present in the graph:

import org.jgrapht._
import org.jgrapht.graph._
import org.jgrapht.alg._
import scala.collection.JavaConverters._
import Util._
import Constants._
import Implicits._

class CliqueGraph(vertices:List[Vertex],xyEdges:List[(Vertex,Vertex)]){
    val graph = new SimpleGraph[Vertex, DefaultEdge](classOf[DefaultEdge])
    vertices.foreach(v=>graph.addVertex(v))
    xyEdges.foreach{ case(v1,v2) =>
            graphg.addEdge(v1,v2)
    }
    lazy val cliques= {
        val c =  new BronKerboschCliqueFinder(graph)
        val setVertices = c.getAllMaximalCliques().asScala
        setVertices.toList
    }
}

In you build.sbt file you need to import the library:

libraryDependencies += "org.jgrapht" % "jgrapht-dist" % "0.9.0"

how can I implement your code for my data which is a text file with edges only: source Vertex and destination Vertex? — simtim, Dec 12 '15 at 15:53

Finding cliques or strongly connected components in Apache Spark using Graphx

1 Answers1