1

I'm trying to make a boxplot with the ggplot2 package in r studo. I've been reading around on past ggplot2 questions but this is just so basic I can't find it covered in detail... I'm bad at using r.

This is my very basic code that I'm trying to use but I don't know my x and y values?

ggplot(data, aes(x,y)) + geom_boxplot()

So, my y values are Pearson Coefficents which is either 0-1 but I'm struggling to put that in as a range. Then I'm just confused because my x values are just 4 different conditions. Should I use a vector? e.g. c(drug 6hr, control, drug 24hr, control)

I succesfully made a basic boxplot using boxplot() but I am using ggplot2 because I want to show every individual value on the plot using jitter which I have also failed to use.

Sorry I have only been using R for about 6 months! Trying to learn as much as I can.

My data:

drug 6hr, control, drug 24hr, control
0.876   0.707   0.709   0.521
0.084   0.275   0.468   0.795
0.911   0.985   0.565   0.150
0.503   0.584   0.693   0.766
0.363   0.102   0.775   0.640
0.219   0.888   0.724   0.516
0.041   0.277   0.877   0.216
0.206   0.974   0.771   0.434
0.787   0.725   0.671   0.916
0.896   0.873   0.443   0.693
0.396   0.641   0.525   0.471
0.250   0.184   0.467   0.537
0.094   0.453   0.641   0.910
0.750   0.748   0.634   0.007
0.026   0.263   0.069   0.725
0.109           0.227   0.535
0.780           0.811   0.241
0.710           0.568   0.029
0.676           0.114   0.237
0.610           0.260   0.241
0.170           0.728   0.405
0.025           0.815   0.914
0.022           0.329   0.766
0.039           0.714
0.034           0.096
0.402           0.988
0.649
0.564
0.190
0.844
0.920
0.744
0.871
0.565
geom
  • 35
  • 4
  • 2
    `control` -duplicate column names are not allowed in data.frame – akrun Mar 01 '20 at 21:10
  • 1
    The x axis in a boxplot is categorical, so you need to have all your y values in a single column and have another column of the same length which labels each measurement according to its group. – Allan Cameron Mar 01 '20 at 21:16

1 Answers1

2

You need to reshape your dataframe into a longer format and then it will makes things easier forg etting your boxplot with ggplot2.

Here, I'm using pivot_longer function from tidyr package to transform your data into two columns with the first one being the name of the condition and the second one contains values:

library(tidyr)
library(dplyr)
DF %>% pivot_longer(everything(), names_to = "var",values_to = "values") 

# A tibble: 136 x 2
   var        values
   <chr>       <dbl>
 1 drug_6hr    0.876
 2 Control_6   0.707
 3 drug_24hr   0.709
 4 Control_24  0.521
 5 drug_6hr    0.084
 6 Control_6   0.275
 7 drug_24hr   0.468
 8 Control_24  0.795
 9 drug_6hr    0.911
10 Control_6   0.985
# … with 126 more rows

Then, you can add the graphic part to the pipe (symbol %>%) sequence by defining your dataframe into ggplot with various aes arguments and use geom_boxplot and geom_jitter functions:

library(tidyr)
library(dplyr)
library(ggplot2)
DF %>% pivot_longer(everything(), names_to = "var",values_to = "values") %>%
  ggplot(aes(x = var, y = values, fill = var, color = var))+
  geom_boxplot(alpha = 0.2)+
  geom_jitter()

Alternatively, to remove the warning messages based on the presence of NA values, you can filter out NA values by adding a filter function between the pivot_longer and ggplot:

DF %>% pivot_longer(everything(), names_to = "var",values_to = "values") %>%
  filter(!is.na(values)) %>%
  ggplot(aes(x = var, y = values, fill = var, color = var))+
  geom_boxplot(alpha = 0.2)+
  geom_jitter()

enter image description here

Does it answer your question ?

Reproducible example

I edited your example in order to make it better for reading into R. I also modify colnames as pointed out by @akrun:

structure(list(drug_6hr = c(0.876, 0.084, 0.911, 0.503, 0.363, 
0.219, 0.041, 0.206, 0.787, 0.896, 0.396, 0.25, 0.094, 0.75, 
0.026, 0.109, 0.78, 0.71, 0.676, 0.61, 0.17, 0.025, 0.022, 0.039, 
0.034, 0.402, 0.649, 0.564, 0.19, 0.844, 0.92, 0.744, 0.871, 
0.565), Control_6 = c(0.707, 0.275, 0.985, 0.584, 0.102, 0.888, 
0.277, 0.974, 0.725, 0.873, 0.641, 0.184, 0.453, 0.748, 0.263, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA), drug_24hr = c(0.709, 0.468, 0.565, 0.693, 0.775, 
0.724, 0.877, 0.771, 0.671, 0.443, 0.525, 0.467, 0.641, 0.634, 
0.069, 0.227, 0.811, 0.568, 0.114, 0.26, 0.728, 0.815, 0.329, 
0.714, 0.096, 0.988, NA, NA, NA, NA, NA, NA, NA, NA), Control_24 = c(0.521, 
0.795, 0.15, 0.766, 0.64, 0.516, 0.216, 0.434, 0.916, 0.693, 
0.471, 0.537, 0.91, 0.007, 0.725, 0.535, 0.241, 0.029, 0.237, 
0.241, 0.405, 0.914, 0.766, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA)), row.names = c(NA, -34L), class = c("data.table", "data.frame"
))
dc37
  • 15,840
  • 4
  • 15
  • 32
  • It worked! It generated a boxplot for me but this error message did appear: ```Error: `data` must be a data frame, or other object coercible by `fortify()`, not an S3 object with class uneval Did you accidentally pass `aes()` to the `data` argument?``` – geom Mar 01 '20 at 21:48
  • I've fixed that but know it's: ```Warning messages: 1: Removed 38 rows containing non-finite values (stat_boxplot). 2: Removed 38 rows containing missing values (geom_point).``` – geom Mar 01 '20 at 22:02