I am trying to make a co-occurence network graph for my presence/absence data of bacteria species but am unsure how to go about with it. I'm hoping to end up with something like this enter image description herewhere each species is linked to another species if they are both present in the same patient, with a larger circle for higher frequency species. I originally tried using widyr and tidygraph packages but I'm not sure if my data set is compatible with them enter image description here, as it has the patients as columns and the individual species as rows. Preferably I would like to know what packages/code I could use that would work with my data set, or how I could change my data set to work with these packages.
Asked
Active
Viewed 1,576 times
2 Answers
3
You can use a matrix cross product to get a co-occurrence matrix. Then it is simple to convert the adjacency matrix into a graph with igraph
package. Try this:
library(igraph)
# Create fake data set
# rows = patients
# cols = species
set.seed(2222)
df <- matrix(sample(c(TRUE, FALSE), 50, replace = TRUE), 5)
colnames(df) <- letters[1:10]
# Generate co-occurrence matrix with crossproduct
co_mat <- t(df) %*% df
# Set diagonal values to 0
diag(co_mat) <- 0
# Assign dim names
dimnames(co_mat) <- list(colnames(df), colnames(df))
# Create graph from adjacency matrix
# ! edge weights are equal to frequency of co-occurrence
g <- graph_from_adjacency_matrix(co_mat, mode = "upper", weighted = TRUE)
# Assign nodes weight equal to species frequency
g <- set.vertex.attribute(g, "v_weight", value = colSums(df))
plot(g, vertex.size = V(g)$v_weight * 5 + 5, edge.width = E(g)$weight * 5)
Here is our fake data
a b c d e f g h i j
[1,] TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE TRUE FALSE
[2,] TRUE FALSE FALSE FALSE TRUE TRUE TRUE FALSE TRUE FALSE
[3,] FALSE TRUE FALSE FALSE TRUE FALSE TRUE FALSE FALSE FALSE
[4,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE
[5,] FALSE TRUE FALSE FALSE TRUE FALSE FALSE TRUE TRUE FALSE
And here is a result:

Istrel
- 2,508
- 16
- 22
-
Thank you for the answer, I just wanted to clarify if the fake data set you used is the one shown or if you manipulated it in anyway to turn rows into patients and cols into species? – J.Dyer Feb 11 '19 at 13:34
-
Initial data shown. You can transpose your dataframe with `t()` – Istrel Feb 11 '19 at 14:44
-
BTW don't forget to upwote answers if they were useful for you. – Istrel Feb 11 '19 at 14:55
-
Thank you again, just one more question though, is there any way to show the required data for a figure legend, such as significance or a key for line width/circle size? – J.Dyer Feb 11 '19 at 23:19
1
Like Istrel, I would also recommend igraph. May a second solution with ggplot..
library(ggnetwork)
library(ggplot2)
library(igraph)
#sample data:
set.seed(1)
mat <- matrix(rbinom(50 * 5, 1, 0.1), ncol = 15, nrow = 100)
# This is not necessary for the example data. But in your case, if you want species as nodes you have to do a transpose:
#mat <- t(mat)
#### Optional! But usually there are often "empty cases" which you might want to remove:
# remove 0-sum-columns
mat <- mat[,apply(mat, 2, function(x) !all(x==0))]
# remove 0-sum-rows
mat <- mat[apply(mat, 1, function(x) !all(x==0)),]
# transform in term-term adjacency matrix
mat.t <- mat %*% t(mat)
##### calculate graph
g <- igraph::graph.adjacency(mat.t,mode="undirected",weighted=T,diag=FALSE)
# calculate coordinates (see https://igraph.org/r/doc/layout_.html for different layouts)
layout <- as.matrix(layout_with_lgl(g))
p<-ggplot(g, layout = layout, aes(x = x, y = y, xend = xend, yend = yend)) +
geom_edges( color = "grey20", alpha = 0.2, size = 2) + # add e.g. curvature = 0.15 for curved edges
geom_nodes(size = (centralization.degree(g)$res +3) , color="darkolivegreen4", alpha = 1) +
geom_nodes(size = centralization.degree(g)$res , color="darkolivegreen2", alpha = 1) +
geom_nodetext(aes(label = vertex.names), size= 5) +
theme_blank()
p
Use ggplot aesthetics:
# calculate degree:
V(g)$Degree <- centralization.degree(g)$res
p<-ggplot(g, layout = layout, aes(x = x, y = y, xend = xend, yend = yend)) +
geom_edges( color = "grey20", alpha = 0.2, size = 2) + # add e.g. curvature = 0.15 for curved edges
geom_nodes(aes(size = Degree) , color="darkolivegreen2", alpha = 1) +
scale_size_continuous(range = c(5, 16)) +
geom_nodetext(aes(label = vertex.names), size= 5) +
theme_blank()
p

everen
- 21
- 5
-
Just one more question, is there any way to show the required data for a figure legend, such as significance or a key for line width/circle size? – J.Dyer Feb 11 '19 at 23:20
-
You can use ggplot2 aes - grammar for this! I added an example to my answer. – everen Feb 12 '19 at 11:11
-
So I was trying to do this with my own data and got this on the last part of code with P. Error: `data` must be a data frame, or other object coercible by `fortify()`, not an S3 object with class igraph. I then tried it with the exact code you used and got the same error, could you help please? – J.Dyer Feb 12 '19 at 16:07
-
I think the error occurs because you havn´t load the ggnetwork package... (library(ggnetwork)). Install if neccessary (install.packages("ggnetwork").. – everen Feb 13 '19 at 12:17
-
I have re-installed all of the packages (ggnetwork, ggplot2 and igraph) but I still get the same error message. I'm really sorry about this but do you have any other suggestions? – J.Dyer Feb 13 '19 at 12:44
-
-
I have tried with bothso far, my data gets a different error though, I'll run it again and post the results. – J.Dyer Feb 13 '19 at 14:08
-
So your data still gives the same error, and then my data can't even turn into a matrix anymore but I think I can sort that out myself. – J.Dyer Feb 14 '19 at 12:38