0

I am trying to plot the Network of the Word Distributions Over Topics (Topic Relation). using this code [source]:

post <- topicmodels::posterior(ldaOut)

cor_mat <- cor(t(post[["terms"]]))
cor_mat[ cor_mat < .05 ] <- 0
diag(cor_mat) <- 0

graph <- graph.adjacency(cor_mat, weighted=TRUE, mode="lower")
graph <- delete.edges(graph, E(graph)[ weight < 0.05])

E(graph)$edge.width <- E(graph)$weight*20
V(graph)$label <- paste("Topic", V(graph))
V(graph)$size <- colSums(post[["topics"]]) * 15

par(mar=c(0, 0, 3, 0))
set.seed(110)
plot.igraph(graph, edge.width = E(graph)$edge.width, 
    edge.color = "orange", vertex.color = "orange", 
    vertex.frame.color = NA, vertex.label.color = "grey30")
title("Strength Between Topics Based On Word Probabilities", cex.main=.8)

Sample of cor_mat data:

          1          2          3          4          5          6          7       ...
1  0.00000000 0.00000000 0.00000000 0.09612831 0.00000000 0.17248020 0.00000000
2  0.00000000 0.00000000 0.07206496 0.00000000 0.00000000 0.05755187 0.00000000
3  0.00000000 0.07206496 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
4  0.09612831 0.00000000 0.00000000 0.00000000 0.08459681 0.00000000 0.06895900
5  0.00000000 0.00000000 0.00000000 0.08459681 0.00000000 0.00000000 0.00000000
6  0.17248020 0.05755187 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
7  0.00000000 0.00000000 0.00000000 0.06895900 0.00000000 0.00000000 0.00000000
8  0.00000000 0.00000000 0.00000000 0.00000000 0.54849308 0.00000000 0.00000000
9  0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.09745720 0.00000000
10 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
11 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
12 0.00000000 0.00000000 0.00000000 0.10329825 0.00000000 0.14057310 0.00000000
13 0.14664201 0.00000000 0.00000000 0.00000000 0.05803984 0.00000000 0.00000000
14 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
15 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
16 0.00000000 0.00000000 0.10290656 0.00000000 0.00000000 0.00000000 0.06293238
17 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
18 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
19 0.00000000 0.00000000 0.00000000 0.00000000 0.33483481 0.00000000 0.00000000
20 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
21 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
22 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.27720724 0.00000000
23 0.12487435 0.14806837 0.00000000 0.10355990 0.00000000 0.05086977 0.00000000
24 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.06622769 0.00000000

Unfortunately, the plot looks like this: enter image description here

Any idea on how to make the network of topics more elegant, showing the links between topics rather than making them overlap each other?

Sultan
  • 189
  • 2
  • 9
  • Post example of you data `ldaOut`. – pogibas Sep 06 '17 at 19:27
  • I updated the question with sample of the data. – Sultan Sep 06 '17 at 19:36
  • what is `post`? – pogibas Sep 06 '17 at 19:43
  • It is the results of the topic model (lda). A Collection of the terms probabilities, and topics probabilities assigned to each document . – Sultan Sep 06 '17 at 19:49
  • Please post reproducible example or no-one will be able to answer your question. You can try generate it using one of the R datasets (eg., `mtcars`) – pogibas Sep 06 '17 at 19:58
  • You could reduce the node size, place the nodes further apart in absolute terms (ie use layout.norm after your chosen layout), output to a larger format (by setting width, height parameters).. – user20650 Sep 06 '17 at 20:05
  • I found out how to fix, the issue, the code posted on GitHub was suitable for certain data-set. thanks – Sultan Sep 06 '17 at 20:05

1 Answers1

0

The simple solution is to change the numbers weight*20 and colSums(post[["topics"]])*15 to smaller numbers in order to avoid overlap issue. The code could be like this

...    
E(graph)$edge.width <- E(graph)$weight* 5
V(graph)$label <- paste("Topic", V(graph))
V(graph)$size <- colSums(post[["topics"]]) * 2
...

And the result of that, enter image description here

Sultan
  • 189
  • 2
  • 9