0

I have been scratching my head at this for ages and I cannot figure out for the life of me what I'm doing wrong.

I'm aware this is very similar to a few other question, most notably: How to plot specific colors and shapes for ggplot2 scatter plot? but the problem is that it's by following the answer in that question that I've arrived at my current problem and have no idea what's gone wrong.

So, here is my data:

comb_frame <- structure(list(decode_beta = c("0.00279501", "-0.0098421", "-0.025254", 
                                             "0.00172701", "0.00531102", "0.000274217", "0.00594772859800487", 
                                             "0.000376995", "0.00082946", "0.00357124647463984", "-0.0018971", 
                                             "0.0083565", "0.00356544", "-0.000609096", "0.00167749", "-0.0150423", 
                                             "-0.022448", "-0.00242648", "-0.00190033", "-0.022692", "0.00536424", 
                                             "-0.00100278", "0.0073661", "0.00092082", "-0.00263694", "0.0076137", 
                                             "0.0072423", "-0.00081708", "-0.01708", "0.00211079", "0.0011098", 
                                             "-0.000107087", "0.0022284", "0.00068709", "-0.00562316159145804", 
                                             "0.00112658", "0.00207365", "-0.000287835", "-0.00286597", "-0.027999", 
                                             "0.00503866", "0.00305786", "-0.001238", "0.0071804", "-0.0084529", 
                                             "0.00556481", "-1.9459e-05", "0.000191271", "-0.017995", "0.002799", 
                                             "-0.024888", "-0.008418", "0.02257", "-0.008174", "-0.019886", 
                                             "-0.00492105", "0.00362115", "0.00392446", "0.00281645"), scallop_beta = c(-0.01011621546, 
                                                                                                                        0.0047657725, -0.02134944, -0.0016247829, 0.0044858415, -0.0015072187, 
                                                                                                                        -0.00782423635, -0.0013813875, -0.001077867, 0.02124057075, 0.0019690364, 
                                                                                                                        -0.004913727, 0.00098559246, 0.00302699872, -0.000395703, -0.02609645934, 
                                                                                                                        -0.02794527222, 0.000946532, 0.000786876, -0.00685633312, -0.004700096, 
                                                                                                                        0.00198448425, 0.00497280424, -0.00480984096, -0.00251334656, 
                                                                                                                        8.4434e-05, 0.00185996837, 0.001175848, -0.01947989552, -0.001227005, 
                                                                                                                        -0.0038851968, -0.00650484, -0.00262378296, 0.003949936, 0.0113079946, 
                                                                                                                        -0.00216854672, -0.000730496, 0.001289556, 0.004527388, -0.01095271456, 
                                                                                                                        0.00580293467, 0.00515290737, 0.000929589, -0.00292289712, 0.0053226888, 
                                                                                                                        -3.969984e-05, -0.0115784, 0.0030260514, -0.00695347872, 0.0092864585, 
                                                                                                                        -0.01863179184, 7.274624e-05, 0.00208976, 0.00042348704, -0.00965808, 
                                                                                                                        -0.0048684602, 0.0045743228, 0.00489489, -0.002105883), significance = c("SCAL SIG", 
                                                                                                                                                                                                 "NON SIG", "NON SIG", "NON SIG", "DEC SIG", "NON SIG", "NON SIG", 
                                                                                                                                                                                                 "NON SIG", "NON SIG", "SCAL SIG", "NON SIG", "DEC SIG", "NON SIG", 
                                                                                                                                                                                                 "NON SIG", "NON SIG", "NON SIG", "NON SIG", "NON SIG", "NON SIG", 
                                                                                                                                                                                                 "NON SIG", "NON SIG", "NON SIG", "DEC SIG", "NON SIG", "NON SIG", 
                                                                                                                                                                                                 "DEC SIG", "NON SIG", "NON SIG", "NON SIG", "NON SIG", "NON SIG", 
                                                                                                                                                                                                 "SCAL SIG", "NON SIG", "NON SIG", "NON SIG", "NON SIG", "NON SIG", 
                                                                                                                                                                                                 "NON SIG", "SCAL SIG", "NON SIG", "DEC SIG", "SCAL SIG", "NON SIG", 
                                                                                                                                                                                                 "DEC SIG", "NON SIG", "DEC SIG", "NON SIG", "NON SIG", "NON SIG", 
                                                                                                                                                                                                 "NON SIG", "NON SIG", "NON SIG", "NON SIG", "NON SIG", "NON SIG", 
                                                                                                                                                                                                 "NON SIG", "SCAL SIG", "SCAL SIG", "NON SIG")), row.names = c(NA, 
                                                                                                                                                                                                                                                               -59L), class = "data.frame")

I am trying to create a scatterplot of the two sets of betas and then colour them by their respective significance in two separate data sets (defined by the third column).

Based on the question I shared I do this:

comb_frame$significance = factor(comb_frame$significance, levels = (unique(comb_frame$significance))) ### First I changed significance into a factor

frame_colours = ifelse(comb_frame$significance == "DEC SIG", "#FF0000", ifelse(comb_frame$significance == "SCAL SIG", "#00A08A", "Gray")) ### I make a vector of the three colours I want

### Then I plot my graph as follows:

ggplot(comb_frame, aes(x = decode_beta, y = scallop_beta)) +
    theme_classic() +
    labs(x = "DeCODE beta (adjusted)",
         y = "SCALLOP beta (adjusted)",
         title = paste0("Proteomics PheWAS correlations ", curr_path)) +
    geom_abline(intercept = 0) +
    geom_smooth(method = "lm", se = FALSE, colour = "red") +
    geom_point(aes(colour = significance)) + 
    scale_color_manual(breaks = unique(comb_frame$significance), values = frame_colours)

This very ALMOST works and produces the following:

Plot output

But as you can see, it is only colouring some of the points. It's colouring those points correctly, but it's then not adding the third colour for some reason and I cannot figure out what's gone wrong.

I have also tried doing this with the significance column not a factor with the same results.

tjebo
  • 21,977
  • 7
  • 58
  • 94
Sabor117
  • 111
  • 1
  • 11

1 Answers1

2

OP, the vector created for the values= argument in scale_color_manual() is used to map those values (color names) against the variations in what is defined for color= (i.e. combi_frame$significance). There are only 3 levels in combi_frame$significance ("SCAL SIG", "NON SIG", and "DEC SIG"), yet frame_colours is a vector with 59 values. Consequently, the first three values of that vector are mapped to the three levels in the order of the levels themselves.

The first 3 values in frame_colours are:

"#00A08A" "Gray"    "Gray" 

So that's why you see that green color (#00A08A) and the others look gray. What you want to do is set values= equal to a vector that can map the colors directly to each level. I find it's easiest to do this via a named vector. try replacing your line frame_colours = ifelse(... ) line with this:

frame_colours = c(
  "DEC SIG"="#FF0000",
  "SCAL SIG"= "#00A08A",
  "NON SIG"="Gray")

Running the plot code then gives you this:

enter image description here

You don't have to supply a named vector, but you must specify in values a vector that has at least as many items as there are levels in what is mapped to the color aesthetic.

chemdork123
  • 12,369
  • 2
  • 16
  • 32
  • Oh my god... What an absolute oversight. I thought it had to be a vector mapping a colour per point. It's a colour per group... Thanks so much! – Sabor117 Jan 06 '23 at 19:37
  • 1
    You can do that (map a color per point) - but you use the special color function `scale_color_identity()` and map to a COLUMN in your dataset that has those colors listed. It's usually far more convenient to have the colors mapped for you though... – chemdork123 Jan 06 '23 at 19:42
  • That's actually potentially also incredibly useful as I have tried to map by column before! Thanks so much! – Sabor117 Jan 11 '23 at 17:13