0

I'm plotting 2 variables (frequency and fold change) by geom_point. As in the figure below, the size corresponds to the frequency and colour to the fold change (FC <1.5 = green, >1.5 = red). To distinguish fold change below 1, I have introduced more breaks. The plot turns out as expected, but I am wondering if there is a way to make the scale at the legend to be equally spaced for the breaks corresponding to this scale c(0, 0.25, 0.5, 0.75, 1, 2, 3, 4, 5, 6) (see "Desired scale" on the right of the plot). Does anyone knows how to achieve this? Thanks a lot in advanced!

enter image description here

Below is the code for the plot.

p <- ggplot(mainG, aes(x = Allele, y = Cohort, size = Freq, color = FC)) +
  geom_point() +
  scale_y_discrete(limits = rev(levels(mainG$Cohort)), position = "right") +
  scale_size_continuous(limits = c(0, 0.5), breaks = c(0, 0.05, 0.10, 0.20, 0.40)) +
  # Underrepresented: FC < 1.5 ; overrepresented: FC > 1.5
  scale_colour_gradientn(
    colours = c('darkgreen', 'forestgreen', 'darkseagreen3', 'darkseagreen2',
              'indianred1', 'indianred2', 'indianred3', 'darkred'),
    values = c(0, 0.25, 0.5, 0.75, 1, 2, 3, 4, 5, 6)/6,
    breaks = c(0, 0.25, 0.5, 0.75, 1, 2, 3, 4, 5, 6)) +
  xlab("") + ylab("") +
  theme(axis.text = element_text(size = 7),
        axis.text.x = element_text(angle = 45, vjust = 1, hjust=1),
        axis.ticks.y = element_blank(),
        panel.border = element_rect(fill = NA),
        panel.background = element_blank(),
        axis.line = element_line(), 
        legend.title = element_text(size = 7),
        legend.text = element_text(size = 7),
        legend.key = element_blank(), 
        legend.position = "right",
        panel.spacing = unit(0.2, "lines"), 
        strip.background =element_rect(colour = "black", fill = NA),
        strip.text = element_text(size = 7, face = "bold", margin = margin(0.1,5,0.1,5, "cm"))
  ) +
  facet_grid(.~ Gene, scales = "free", space = "free")

p + guides(colour = guide_colourbar(barwidth = unit(0.5, "cm"), barheight = unit(5, "cm"), 
                                    direction = "vertical"),
           size = guide_legend(title = "Frequency", reverse = T))

Edit: here's the sample data of mainG for reproducibility (sorry the section is becoming lengthy...)

Cohort  Gene    Allele  Freq    FC
Cohort1 B   *08:01  0.027   0.24
Cohort2 A   *01:01  0.103   0.63
Cohort2 A   *68:01  0.103   3.63
Cohort2 B   *07:02  0.207   1.59
Cohort2 B   *08:01  0.034   0.31
Cohort2 B   *51:01  0.121   2.44
Cohort3 A   *02:01  0.407   1.51
Cohort3 A   *03:01  0   NA
Cohort3 B   *07:02  0.037   0.28
Cohort3 B   *08:01  0.019   0.17
Cohort3 B   *44:03  0.148   3.15
Cohort4 A   *02:01  0.17    0.63
Cohort4 A   *24:02  0.17    2.01
Cohort4 B   *07:02  0.05    0.38
Cohort4 B   *18:01  0.11    2.41
Cohort5 B   *07:02  0.053   0.4
Cohort5 B   *18:01  0.105   2.31
Cohort6 B   *07:02  0.041   0.31
Cohort6 B   *18:01  0.122   2.69
Cohort6 B   *51:01  0.102   2.06
Control A   *01:01  0.163   NA
Control A   *02:01  0.269   NA
Control A   *03:01  0.14    NA
Control A   *24:02  0.085   NA
Control A   *26:01  0.035   NA
Control A   *29:02  0.035   NA
Control A   *32:01  0.037   NA
Control A   *68:01  0.029   NA
Control B   *07:02  0.13    NA
Control B   *08:01  0.11    NA
Control B   *18:01  0.046   NA
Control B   *44:02  0.087   NA
Control B   *44:03  0.047   NA
Control B   *51:01  0.05    NA
WC Lim
  • 15
  • 3
  • Could you please provide dpur(mainG)? – TarJae Sep 03 '22 at 12:30
  • @TarJae I have edited the question to include the data for mainG – WC Lim Sep 03 '22 at 12:51
  • I think that rather than doing this manually using a transformation is best. Like for _x_ and _y_ scales, any numeric based scale supports transformations, even though they need to be set by passing suitable arguments. A scale as you request is missleading because it is not uniform, you are adding a break that is not visible. A transformation would solve the problem in a better way. Package 'ggpmisc' has `scale_x_logFC()` and `scale_y_logFC()` but not yet the equivalent color scale. – Pedro J. Aphalo Sep 03 '22 at 12:51
  • @PedroJ.Aphalo thank you for your comment. To be honest, I am still contemplating the usefulness of making a scale like this, yes as you said it's kinna misleading but the main purpose is to distinguish the different ranges for the lower FCs. Maybe then I should try the log scales as you recommended. – WC Lim Sep 03 '22 at 12:57

2 Answers2

1

You can do this the same way as you would use a secondary axis: transform the data and apply the inverse transformation in the scale:

ggplot(mainG, aes(x = Allele, y = Cohort, size = Freq, 
                  color = ifelse(FC < 1, FC * 4, FC + 3))) +
  geom_point() +
  scale_y_discrete(limits = rev(levels(mainG$Cohort)), position = "right") +
  scale_size_continuous(limits = c(0, 0.5), range = c(1, 10),
                        breaks = c(0, 0.05, 0.10, 0.20, 0.40)) +
  scale_colour_gradientn(name = "Frequency",
    colours = c('darkgreen', 'forestgreen', 'darkseagreen3', 'darkseagreen2',
                'indianred1', 'indianred2', 'indianred3', 'darkred'),
    values = 0:10 / 10, breaks = 0:10, labels = ~ifelse(.x < 4, .x/4, .x-3)) +
  xlab("") + ylab("") +
  theme(axis.text = element_text(size = 7),
        axis.text.x = element_text(angle = 45, vjust = 1, hjust=1),
        axis.ticks.y = element_blank(),
        panel.border = element_rect(fill = NA),
        panel.background = element_blank(),
        axis.line = element_line(), 
        legend.title = element_text(size = 7),
        legend.text = element_text(size = 7),
        legend.key = element_blank(), 
        legend.position = "right",
        panel.spacing = unit(0.2, "lines"), 
        strip.background =element_rect(colour = "black", fill = NA),
        strip.text = element_text(size = 7, face = "bold", 
                                  margin = margin(0.1,5,0.1,5, "cm"))) +
  facet_grid(.~ Gene, scales = "free", space = "free") +
  guides(colour = guide_colourbar(barwidth = unit(0.5, "cm"), 
                                  barheight = unit(5, "cm"), 
                                  direction = "vertical"),
           size = guide_legend(title = "Frequency", reverse = T))

enter image description here

Allan Cameron
  • 147,086
  • 7
  • 49
  • 87
  • Thanks a lot! Can you explain what does the .x means in this ```~ifelse(.x < 4, .x/4, .x-3)``` ? – WC Lim Sep 03 '22 at 14:28
  • @WCLim this is tidyverse syntax for an anonymous function, where `.x` is the stand-in for the variable passed to the function. It could be rewritten as `function(x) ifelse(x < 4, x / 4, x - 3)` – Allan Cameron Sep 03 '22 at 14:42
1

Using logs one gets close to what the question asks without confusing those reading the plot by using a non-uniform scale. It is fairly easy to modify this example to use other log bases, other transformations or other colours. Alternatively, the transformation can be applied in the scale rather than in aes(). (I did not include the call to theme() as it is not relevant to the question.)

library(ggplot2)

ggplot(mainG, aes(x = Allele, y = Cohort, size = Freq, color = log2(FC))) +
  geom_point() +
  scale_y_discrete(limits = rev(levels(mainG$Cohort)), position = "right") +
  scale_size_continuous(limits = c(0, 0.5), 
                        breaks = rev(c(0, 0.05, 0.10, 0.20, 0.40))) +
  scale_colour_gradient2(name = "FC",
                         high = "darkred", mid = "lightyellow3", low = "darkgreen",
                         labels = function(x) {2^x}, 
                         breaks = log2(c(c(1/8, 1/4, 1/2, 1, 2, 4, 8)))) +
  expand_limits(colour = log2(c(1/5, 5))) +
  facet_grid(.~ Gene, scales = "free", space = "free") +
  theme_bw()

Created on 2022-09-03 with reprex v2.0.2

I think the best approach is to use one of the new binned scales for continuous data from 'ggploy2'.

library(ggplot2)

ggplot(mainG, aes(x = Allele, y = Cohort, size = Freq, color = log2(FC))) +
  geom_point() +
  scale_y_discrete(limits = rev(levels(mainG$Cohort)), position = "right") +
  scale_size_continuous(limits = c(0, 0.5), 
                        breaks = rev(c(0, 0.05, 0.10, 0.20, 0.40))) +
  scale_colour_steps2(name = "FC",
                      high = "darkred", mid = "lightyellow3", low = "darkgreen",
                      labels = function(x) {2^x}, 
                      breaks = log2(c(c(1/8, 1/4, 1/2, 1, 2, 4, 8)))) +
  expand_limits(colour = log2(c(1/5, 5))) +
  facet_grid(.~ Gene, scales = "free", space = "free") +
  theme_bw()

Created on 2022-09-03 with reprex v2.0.2

Pedro J. Aphalo
  • 5,796
  • 1
  • 22
  • 23