10

Trying to find communities in tweet data. The cosine similarity between different words forms the adjacency matrix. Then, I created graph out of that adjacency matrix. Visualization of the graph is the task here:

# Document Term Matrix
dtm = DocumentTermMatrix(tweets)

### adjust threshold here
dtms = removeSparseTerms(dtm, 0.998)
dim(dtms)

# cosine similarity matrix
t = as.matrix(dtms)

# comparing two word feature vectors
#cosine(t[,"yesterday"], t[,"yet"]) 

numWords = dim(t)[2]

# cosine measure between all column vectors of a matrix.
adjMat = cosine(t)

r = 3
for(i in 1:numWords)
{
  highElement  = sort(adjMat[i,], partial=numWords-r)[numWords-r]
  adjMat[i,][adjMat[i,] <  highElement] = 0
}

# build graph from the adjacency matrix
g = graph.adjacency(adjMat, weighted=TRUE, mode="undirected", diag=FALSE)
V(g)$name

# remove loop and multiple edges
g = simplify(g)
wt = walktrap.community(g, steps=5) # default steps=2
    table(membership(wt))

# set vertex color & size
nodecolor = rainbow(length(table(membership(wt))))[as.vector(membership(wt))]
nodesize = as.matrix(round((log2(10*membership(wt)))))
nodelayout = layout.fruchterman.reingold(g,niter=1000,area=vcount(g)^1.1,repulserad=vcount(g)^10.0, weights=NULL)

par(mai=c(0,0,1,0)) 
plot(g, 
     layout=nodelayout,
     vertex.size = nodesize,
     vertex.label=NA,
     vertex.color = nodecolor,
     edge.arrow.size=0.2,
     edge.color="grey",
     edge.width=1)

I just want to have some more gap between separate clusters/communities.

different communities are shown by different colors

magarwal
  • 564
  • 4
  • 17
  • please introduce the g or an example of it –  Feb 25 '15 at 10:09
  • Have you tried changing the area of the plot? Default is `area = vcount(graph)^2` (http://www.inside-r.org/packages/cran/igraph/docs/layout) – Jon Cardoso-Silva Feb 25 '15 at 10:32
  • Just updated with the latest code and graph viz. – magarwal Feb 25 '15 at 10:45
  • I don't like fruchtermal.reingold algorithm because this things keep happening and I don't usually know how to fix. What I usually do is: I export my graph to Gephi, use Force Atlas 2 algorithm (checking the options 'Prevent Overlap' and 'Dissuade Hubs'), this way I usually a good community structure visualization. I hope someone will tell a best way to solve this for you here and I'll learn it too. – Jon Cardoso-Silva Feb 25 '15 at 10:50
  • @jonathancardoso - how do you export graph from R to Gephi. Gephi looks pretty interesting ! – magarwal Feb 25 '15 at 11:43
  • @magarwal, I usually do `write.graph(g,file="graph.gml",format="gml")`, using igraph, then I load the GML file into Gephi. – Jon Cardoso-Silva Feb 25 '15 at 11:47
  • I would try increasing edge weights within communities. That way nodes in the same community are kept together. – Gabor Csardi Feb 25 '15 at 15:10

1 Answers1

6

To the best of my knowledge, you can't layout vertices of the same community close to each other, using igraph only. I have implemented this function in my package NetPathMiner. It seems it is a bit hard to install the package just for the visualization function. I will write the a simple version of it here and explain what it does.

layout.by.attr <- function(graph, wc, cluster.strength=1,layout=layout.auto) {  
        g <- graph.edgelist(get.edgelist(graph)) # create a lightweight copy of graph w/o the attributes.
        E(g)$weight <- 1

        attr <- cbind(id=1:vcount(g), val=wc)
        g <- g + vertices(unique(attr[,2])) + igraph::edges(unlist(t(attr)), weight=cluster.strength)

        l <- layout(g, weights=E(g)$weight)[1:vcount(graph),]
        return(l)
}

Basically, the function adds an extra vertex that is connected to all vertices belonging to the same community. The layout is calculated based on the new graph. Since each community is now connected by a common vertex, they tend to cluster together.

As Gabor said in the comment, increasing edge weights will also have similar effect. The function leverages this information, by increasing a cluster.strength, edges between created vertices and their communities are given higher weights.

If this is still not enough, you extend this principle (calculating the layout on a more connected graph) by adding edges between all vertices of the same communities (forming a clique). From my experience, this is a bit of an overkill.

ahmohamed
  • 2,920
  • 20
  • 35
  • Hey there I am getting an error "Error in layout(g, weights = E(g)$weight)[1:(vcount(graph)), ] : subscript out of bounds"Has anyone ever encountered this? – Brofessor Aug 23 '21 at 05:20
  • Also, was this line of the code "g <- graph.edgelist(get.edgelist(graph)) # create a lightweight copy of graph w/o the attributes." intended to make a perfect copy of the input graph? Was it okay to lose nodes through the process? – Brofessor Aug 23 '21 at 05:36
  • I suspect your graph has disconnected nodes (with no edges to any other nodes), in which case they'll not be present in the new graph (created from edges only). I admit, I haven't accounted to this edge case. In your case, can you try to just copy your graph `g <- graph` instead? – ahmohamed Aug 23 '21 at 13:18