2

I have a standard data.Frame with some categorical columns and one numeric column. It represents a nested experimental design (but it doesn't really matter), like this:

set.seed(1234)
data = data.frame(toplevel=c("A","A","A","A", "B", "B", "B"), 
                  second = c("A1", "A1", "A2", "A2", "B1", "B1", "B2"),
                  experiments = paste0("exp_00", 1:7),
                  values = runif(7, 1,100))
####   toplevel second experiments   values
#### 1        A     A1     exp_001 12.25664
#### 2        A     A1     exp_002 62.60764
#### 3        A     A2     exp_003 61.31820
#### 4        A     A2     exp_004 62.71456
#### 5        B     B1     exp_005 86.23062
#### ...

i would like to do the same plot with the code of this (the left plot!): https://www.r-graph-gallery.com/314-custom-circle-packing-with-several-levels/

I don't know how to turn my dataframe into an "igraph" data.Frame and proceed with the code suggested to plot (i don't have a from and to column..). My desired output would look like the plot on the right, given my example data (circle size represents the values columns). I tried unsuccessfully using graph_from_data_frame

enter image description here

Thanks edit: my attempted code so far (I only have part of the graph..?)

library(tidyverse); library(igraph); library(ggraph)
edges = rbind.data.frame(data[,1:2] %>% setNames(c("from", "to")), data[, 2:3] %>% setNames(c("from", "to")))
vertices = bind_rows(
  data %>% group_by(toplevel) %>% summarize(values=sum(values)) %>% select(name=toplevel, values),
  data %>% group_by(second) %>% summarize(values=sum(values)) %>% select(name=second, values),
  data %>% select(name=experiments, values)
)
mygraph=graph_from_data_frame(edges, directed = TRUE, vertices = vertices)
ggraph(mygraph, layout = 'circlepack',weight="values") + 
  geom_node_circle() +
  theme_void()
agenis
  • 8,069
  • 5
  • 53
  • 102

1 Answers1

1

I think your current attempt is definitely on the right track. One trick that may help is adding an extra "root" node that both of your top-level nodes connect to. At the moment because there is no connection at all between your top level nodes, ggraph only plots one set of them:

edges = rbind.data.frame(
    data[,1:2] %>% setNames(c("from", "to")), 
    data[, 2:3] %>% setNames(c("from", "to")),
    data.frame(from = c("root", "root"), to = c("A", "B")))
vertices = bind_rows(
    data.frame(name = "root", values = sum(data$values)),
    data %>% group_by(toplevel) %>% summarize(values=sum(values)) %>% select(name=toplevel, values),
    data %>% group_by(second) %>% summarize(values=sum(values)) %>% select(name=second, values),
    data %>% select(name=experiments, values),
)
mygraph=graph_from_data_frame(edges, directed = TRUE, vertices = vertices)
ggraph(mygraph, layout = 'circlepack',weight="values") + 
    geom_node_circle(aes(fill = depth)) +
    geom_node_label(aes(label = name)) +
    theme_void()
Marius
  • 58,213
  • 16
  • 107
  • 105
  • oh OK I see the problem, now it plots correctly the whole graph with the colors. Though i was wondering, is n't there a more easy way to create a graph from such a dataframe, without all my complicated code? – agenis Mar 21 '19 at 00:17
  • I don't know, most of the "complicated code" is just reshaping the data. If the tools you're using expect the data in a different shape to what you start with you'll always need to do a bit of work to transform it. Looking at it again, I think you only actually need `values` to be set for the lowest level 'experiment' vertices - `ggraph` will calculate the totals for higher levels automatically. So you could save a bit of work on the calculation of totals. – Marius Mar 21 '19 at 03:17