0

I work in Genetics and very recently started practicing R, ggplot in particular, to make images for publication.

I run several analyses and combined in a single table (Attempt_mod_TopTen) 3 dataset containing:

  • y axis (GO_term) = names that have to be in common for the entire graph
  • x1 axis (N_Genes_AB) = numbers of genes associated with y, and their p_value
  • X2 axis (N_Genes_A) = numbers of genes associated with y, and their p_value
  • X3 axis (N_Genes_B) = numbers of genes associated with y, and their p_value

I am able to make a figure with single panel using only one set of data (either X1, X2, or X3)

    library(ggplot2)
    theme_set(
      theme_bw() 
    )
    ggplot(data = Attempt_mod_TopTen)+geom_point(aes(x=N_Genes_AB, y=GO_Term, size=P_value_AB)) +
      scale_size(range = c(4,.2))   
           

Bubble plot displaying the relation between the GO_terms, the number of genes and their P value:

I saw that there are posts on how to make bubble plot with multiple panels (i.e. facet_wrap), but I guess I am doing something wrong with my data because I can't find how to incorporate data for X2 and X3. I would like to display what I am able to make in a single panel in 3 panels next to each other that share the same y axis.

camille
  • 16,432
  • 18
  • 38
  • 60
Peps
  • 1
  • 1
  • Are you sure you have enough space to plot three panels and those long y axis labels? It will need to be a very large plot. You can do what you are asking by pivoting your data into long format, so that your three columns of x values becomes two columns: all the actual values, and a new column to label which x axis the value refers to. You then facet on your new column. Check out `tidyr::pivot_longer` and if you get stuck, you will need to edit your question to include your data, using `dput(Attempt_mod_TopTen)` to get your data in a cut-and-paste format suitable for Stack Overflow. – Allan Cameron Dec 27 '20 at 22:07
  • 2
    Welcome to Stack Overflow! Help us help you: Provide a [mcve]. In particular, it will be hard to provide meaningful help without access to (at least a subset of) your data. You can [edit] your question to include the output of the R command `dput(Attempt_mod_TopTen)` so that we can better help – duckmayr Dec 27 '20 at 22:08
  • 1
    @duckmayr thank you, I did insert the output of the dput command – Peps Dec 28 '20 at 08:41

1 Answers1

2

As Allan Cameron suggested in the comments, the key here will be changing the way your data are structured via tidyr::pivot_longer(). This can be accomplished like so:

library(tidyverse)

dat <- Attempt_mod_TopTen %>%
    pivot_longer(
        !GO_Term,
        names_to = c(".value", "grp"),
        names_pattern = "([A-Z]_[A-Za-z]+)_([A-Z]+)"
    )

This is a straightforward application of the "Multiple observations per row" subsection of the "Longer" section in the "Pivoting" vignette of the tidyr package, which you can access via the R command vignette("pivot", package = "tidyr"); I would strongly suggest reading through it for an in-depth understanding of what was done there. However, we can get an idea by simply looking at the result. You can see we've turned every row into three rows, one for group "A", one for group "B", and one for group "AB". Then we don't need six columns for our observations of "N_Genes" and "P_value", but just two. Now we can easily use facet_wrap(), with the newly created grp column dictating the facets:

ggplot(data = dat) +
    facet_wrap(~grp) +
    geom_point(aes(x = N_Genes, y = GO_Term, size = P_value)) +
    scale_size(range = c(4,.2)) +
    theme(
        axis.text = element_blank(),
        axis.title = element_blank(),
        axis.ticks = element_blank()
    )

enter image description here

duckmayr
  • 16,303
  • 3
  • 35
  • 53
  • Thank you. I am reading the documentation for the pivot_longer, not of easy understanding for an entry R level. I guess when I made that table with multiple columns in excel I could have made it in a different way suitable for facet_wrap. These data eventually will have to be published, can we have some sort of protection? (I don't know, can you delete the actual data and leave just the suggestions you gave to me?) – Peps Dec 29 '20 at 07:49
  • @Peps Yeah, data reshaping functions are famously difficult to understand, it's actually not unique to R. The economists I know that use Stata joke about how many times they have to look up the help file for the `reshape` command. I can edit the answer a little later today to remove your actual data – duckmayr Dec 29 '20 at 13:44
  • Thanks, can you delete the graphs as well? – Peps Dec 29 '20 at 15:21
  • 1
    @Peps Unfortunately, with neither data nor the output chart, the Q&A won't be very useful to other users. Maybe you can create a dummy dataset that still recreates the issue – camille Dec 29 '20 at 16:06
  • Peps Surely the plots shouldn't be problem. I understand not wanting the data easily visible to anyone who stumbles across the post (though note they can go through the edit history!) if you're working on pushing out a paper and don't want to be "scooped". (Though, to be honest, I'd guess this is a pretty low danger place to have posted). But I'm not sure how the plot could cause that issue. I think the best solution is as @camille mentions, creating a dummy dataset that is similar in structure. – duckmayr Dec 29 '20 at 22:24
  • Camille and @duckmayr Yeah I don't think either that I could be scooped, but beside the data itself, I might include the plot in a figure of the paper (I am trying to change the color of the dots, still figuring out how) and probably it can be a problem the same plot is posted here. But I agree, the Q&A has to be available to everyone, how about we delete the labels of the X axis? This should solve any problem and eventual future conflict – Peps Dec 30 '20 at 09:11
  • @Peps Axis labels removed – duckmayr Dec 30 '20 at 14:21
  • @duckmayr Sorry I meant to remove all the small labels on the X Axis, and leave the words GO_Term and N_Genes that are actually used in the code so can help orienting. Also, using your Q&A I was able to label the dot based on positive and negative values (that in Genetics have an important meaning), I don't mind inserting the code here if you thing it can be helpful. – Peps Dec 30 '20 at 15:09
  • @Peps Done. Feel free to [accept the answer if it solved your issue!](https://stackoverflow.com/help/someone-answers) – duckmayr Dec 30 '20 at 15:17