-2

I am trying to create a hierarchical graph (like a tree structure but ideally in a circle) that reflects a 3 way connection that goes from the individual sample to species to what label it falls under. Essentially emphasizing that all of these are different species but they fall under the same label. However when I try to create my graph my labels are overlapping and you cant see what is each part, everything gets super condensed. This doesn't happen when I use a smaller sample amount like 20 and less species, but I have a large data set of 121 which leads to overlap. How can I fix the clustering with a set this large? For aesthetics I would like it to look like a circle, but I even tried not doing it as one and it still overlapped... Example data:

Sample Species Label 
L1 Shark Seafood 
L2 Tuna Seafood 
L3 Shark Seafood 
L4 Shrimp Seafood
L5 Crab Seafood 
L6 Squid Seafood 
L7 Shrimp Seafood 
L8 Shark Seafood 
L9 Shark Seafood 
L10 Crab Seafood 
L11 Tuna Seafood 
L12 Shrimp Seafood 
L13 Crab Seafood 
L14 Crab Seafood 
L15 Shark Seafood 
L16 Tuna Seafood 
L17 Squid Seafood 
L18 Shark Seafood 
L19 Squid Seafood 
L20 Shrimp Seafood 
L21 Shark Seafood 
L22 Tuna Seafood 
L23 Shark Seafood 
L24 Shrimp Seafood 
L25 Crab Seafood 
L26 Tuna Seafood 
L27 Shrimp Seafood 
L28 Shark Seafood 
L29 Shark Seafood 
L30 Crab Seafood 
L31 Tuna Seafood 
L32 Shrimp Seafood 
L33 Crab Seafood 
L34 Crab Seafood 
L35 Lobster Seafood 
L36 Tuna Seafood 
L37 Tuna Seafood 
L38 Shark Seafood 
L39 Shark Seafood 
L40 Shrimp Seafood 
L41 Shark Seafood 
L42 Tuna Seafood 
L43 Shark Seafood 
L44 Shrimp Seafood 
L45 Crab Seafood 
L46 Tuna Seafood 
L47 Shrimp Seafood 
L48 Shark Seafood 
L49 Shark Seafood 
L50 Crab Seafood 
L51 Tuna Seafood 
L52 Shrimp Seafood 
L53 Crab Seafood 
L54 Crab Seafood 
L55 Shark Seafood 
L56 Cod Seafood 
L57 Tuna Seafood 
L58 Shark Seafood 
L59 Shark Seafood 
L60 Shrimp Seafood 
L61 Shark Seafood 
L62 Cod Seafood 
L63 Shark Seafood 
L64 Shrimp Seafood 
L65 Crab Seafood 
L66 Tuna Seafood 
L67 Shrimp Seafood 
L68 Shark Seafood 
L69 Shark Seafood 
L70 Crab Seafood 
L71 Tuna Seafood 
L72 Cod Seafood 
L73 Lobster Seafood 
L74 Crab Seafood 
L75 Shark Seafood 
L76 Tuna Seafood 
L77 Lobster Seafood 
L78 Shark Seafood 
L79 Shark Seafood 
L80 Shrimp Seafood 
L81 Shark Seafood 
L82 Lobster Seafood 
L83 Shark Seafood 
L84 Shrimp Seafood 
L85 Salmon Seafood 
L86 Tuna Seafood 
L87 Salmon Seafood 
L88 Shark Seafood 
L89 Blowfish Seafood 
L90 Flounder Seafood 
L91 Tuna Seafood 
L92 Shrimp Seafood 
L93 Crab Seafood 
L94 Lobster Seafood 
L95 Shark Seafood 
L96 Tuna Seafood 
L97 Blowfish Seafood 
L98 Shark Seafood 
L99 Shark Seafood 
L100 Flounder Seafood 
L101 Shark Seafood 
L102 Tuna Seafood 
L103 Shark Seafood 
L104 Salmon Seafood 
L105 Crab Seafood 
L106 Salmon Seafood 
L107 Shrimp Seafood 
L108 Shark Seafood 
L109 Shark Seafood 
L110 Crab Seafood 
L111 Tuna Seafood 
L112 Shrimp Seafood 
L113 Crab Seafood 
L114 Flounder Seafood 
L115 Shark Seafood 
L116 Tuna Seafood 
L117 Tuna Seafood 
L118 Shark Seafood 
L119 Shark Seafood 
L120 Shrimp Seafood 
L121 Shrimp Seafood 

The code I have so far is:

library(tidygraph)
library(ggraph)
library(extrafont)

df1 <- Example[2:1]
df1$Sample <- paste(df1$Species, df1$Sample)
df2 <- Example[3:2]
names(df2) <- names(df1)

as_tbl_graph(rbind(df1, df2)) %>%
  activate(nodes)%>%
  mutate(group = ifelse(grepl(' ', name), sub(' .*$', '', name), name),
         name = ifelse(grepl(' ', name), sub('^.* ', '', name), name)) %>%
  ggraph(layout = 'igraph', algorithm = 'tree', circular = TRUE) +
  geom_edge_arc(aes(color = factor(from)), width = 2, alpha = 0.3) +
  geom_node_circle(aes(r = nchar(name)/45, fill = group), color = NA) +
  geom_node_text(aes(label = name), fontface = 2,
                 size = 6, color = "gray30") +
  theme_graph() +
  coord_equal() +
  scale_edge_color_brewer(palette = 'Pastel2') +
  scale_fill_brewer(palette = 'Pastel2', 
                    limits = c('Shark', 'Tuna', 'Shrimp', 'Crab', 'Seafood', 'Cod', 'Blowfish', 'Lobster', 'Squid', 'Flounder', 'Salmon')) +
  theme(legend.position = 'none')

What I end up getting with this code is this: enter image description here

What I would like to have but with my larger data set: enter image description here

Any tips/insight would be super helpful!

IzOss
  • 25
  • 2
  • 2
    General questions about data visualization are a better fit for [Cross Validated](https://stats.stackexchange.com/help/on-topic). Stack Overflow is for specific programming questions and it doesn't sound like you know for sure what output you want. There are only so many pixels in a image so I'm not sure what you want to do with all those text labels. Have you tried just making the output image larger and the font size smaller? – MrFlick Dec 12 '22 at 21:56
  • Mr.Flick, I do think my question is a specific programming question, but I guess I framed my title too general. I have tried doing a larger output box and smaller font and both those don't change my clustering issue which is why I created this post because I think I am missing a piece of code that could adjust the clustering or hopefully stop it. The second photo I have posted is the output I want but I only get that if I use a small data set, but I do want all those text labels as they help visualize my data. – IzOss Dec 13 '22 at 23:35
  • To clarify your question: how many nodes do you want to place at the outer ring? – clp Dec 14 '22 at 14:05
  • I would like 121 nodes in the outer ring, 10 nodes in the second ring and 1 in the third – IzOss Dec 16 '22 at 20:53

1 Answers1

0

For a quick fix, just lower your geom_node_text and geom_node_circle sizes. It seems your palette choices don't have enough colors to fill all your species either, so I suggest finding a palette that has enough colors for that too. The default palette has more. Your method for coloring the arcs won't work either - this can be fixed by using geom_edge_arc2 to use node attributes to color the arcs -

as_tbl_graph(rbind(df1, df2)) %>%
  activate(nodes)%>%
  mutate(group = ifelse(grepl(' ', name), sub(' .*$', '', name), name),
         name = ifelse(grepl(' ', name), sub('^.* ', '', name), name)) %>%
  ggraph(layout = 'igraph', algorithm = 'tree', circular = TRUE) +
  geom_edge_arc2(aes(color = node.group), width = 2, alpha = 0.3) +
  geom_node_circle(aes(r = nchar(name)/100, fill = group), color = NA) +
  geom_node_text(aes(label = name), fontface = 2,
                 size = 6, color = "gray30") +
  theme_graph() +
  coord_equal()