I've a relatively large graph with Vertices: 524 Edges: 1125, of real world transactions. The edges are directed and have a weight (inclusion is optional). I'm trying investigate the various communities within the graph and essentially need a method which:
-Calculates all possible communities
-Calculates the optimum number of communities
-Returns the members/# of members of each (optimum) community
So far I've managed to pull together the following code which plots a color coded graph corresponding to the various communities, however I've no idea how to control the number of communities (i.e plot the top 5 communities with the highest membership) or list the members of a particular community.
library(igraph)
edges <- read.csv('http://dl.dropbox.com/u/23776534/Facebook%20%5BEdges%5D.csv')
all<-graph.data.frame(edges)
summary(all)
all_eb <- edge.betweenness.community(all)
mods <- sapply(0:ecount(all), function(i) {
all2 <- delete.edges(all, all_eb$removed.edges[seq(length=i)])
cl <- clusters(all2)$membership
modularity(all, cl)
})
plot(mods, type="l")
all2<-delete.edges(all, all_eb$removed.edges[seq(length=which.max(mods)-1)])
V(all)$color=clusters(all2)$membership
all$layout <- layout.fruchterman.reingold(all,weight=V(all)$weigth)
plot(all, vertex.size=4, vertex.label=NA, vertex.frame.color="black", edge.color="grey",
edge.arrow.size=0.1,rescale=TRUE,vertex.label=NA, edge.width=.1,vertex.label.font=NA)
Because the edge betweenness method performed so poorly I tried again using the walktrap method:
all_wt<- walktrap.community(all, steps=6,modularity=TRUE,labels=TRUE)
all_wt_memb <- community.to.membership(all, all_wt$merges, steps=which.max(all_wt$modularity)-1)
colbar <- rainbow(20)
col_wt<- colbar[all_wt_memb$membership+1]
l <- layout.fruchterman.reingold(all, niter=100)
plot(all, layout=l, vertex.size=3, vertex.color=col_wt, vertex.label=NA,edge.arrow.size=0.01,
main="Walktrap Method")
all_wt_memb$csize
[1] 176 13 204 24 9 263 16 2 8 4 12 8 9 19 15 3 6 2 1
19 clusters - Much better!
Now say I had a "known cluster" with a list of its members and and wanted to check each of the observed clusters for the presence of members from the "known cluster". Returning the percentage of members found. Unable to finish the following??
list<-read.csv("http://dl.dropbox.com/u/23776534/knownlist.csv")
ength(all_wt_memb$csize) #19
for(i in 1:length(all_wt_memb$csize))
{
match((V(all)[all_wt_memb$membership== i]),list)
}