0

I have data that I would like to compare in a grouped boxplot, meaning comparing the before/after response to each treatment. The issue is my trial number for each type of treatment is different so I cannot create a dataframe (I am getting an error in the dataframe)

QXpre <- c(3,4,2,1,4,5,4,2,8)
QXpost <- c(0,4,0,0,0,7,0,1,6)
lidopre <-c(5,3,4,5,6)
lidopost <- c(0,0,0,1,2)
vehipre <- c(3,3,5,3,4,3,4)
vehipost <- c(4,3,3,12,6,4,10)

DF1D <- data.frame(QXpre, QXpost, lidopre, lidopost, vehipre, vehipost)

To clarify, I would like: within each group to compare the pre and post values, but have each group show up on the same plot so I can compare statistics across groups.

Thank you!

2 Answers2

3

Instead of putting all vectors in one dataframe create a list of data frames per treatment. Afterwards reshape each one to long or tidy format using e.g. tidyr::pivot_longer and bind them by rows for which I use purrr::imap_dfr for convenience:

library(tidyverse)

dat <- list(
  QX = data.frame(QXpre, QXpost),
  lido = data.frame(lidopre, lidopost),
  vehi = data.frame(vehipre, vehipost)
) |> 
  purrr::imap_dfr(~ tidyr::pivot_longer(.x, everything(), names_prefix = .y), .id = "treatment")

head(dat)
#> # A tibble: 6 × 3
#>   treatment name  value
#>   <chr>     <chr> <dbl>
#> 1 QX        pre       3
#> 2 QX        post      0
#> 3 QX        pre       4
#> 4 QX        post      4
#> 5 QX        pre       2
#> 6 QX        post      0

dat$name <- factor(dat$name, levels = c("pre", "post"))

ggplot(dat, aes(treatment, value, fill = name)) +
  geom_boxplot()

enter image description here

stefan
  • 90,330
  • 6
  • 25
  • 51
  • Thank you, this is exactly what I had in mind! Could you perhaps elaborate on how I could change around the order (here the post in red is showing up before the pre in blue, I would like for it to be the other way around). I see that it is in the correct order in the dataset but not appearing in the correct order within the plot. – Nurit Eliana Feb 16 '23 at 09:03
  • See my edit. The order in the data has in general no effect on the order in the plot. Instead convert your column, i.e. `name` to a factor with the order of the levels set in your desired order. – stefan Feb 16 '23 at 09:07
1

Just to offer another solution. You can create a named list of all your vectors and then use stack() to create a data.frame in the long format. Afterwards you can use strsplit() to create two variables for your groups and timepoints. The rest is the same as in stefans answer.

library(ggplot2)

vector.list = list(
  QXpre = c(3,4,2,1,4,5,4,2,8),
  QXpost = c(0,4,0,0,0,7,0,1,6),
  lidopre =c(5,3,4,5,6),
  lidopost = c(0,0,0,1,2),
  vehipre = c(3,3,5,3,4,3,4),
  vehipost = c(4,3,3,12,6,4,10)
)

df <- stack(vector.list) # creates a data.frame in long format
df[, c("group", "time")] <- do.call(rbind, strsplit(as.character(df$ind), "(?<=.)(?=pre|post)", perl = TRUE)) # splits the names into two variables

df$time <- factor(df$time, levels = c("pre", "post")) # set the order of pre and post

ggplot(df, aes(group, values, fill = time)) +
  geom_boxplot()

Created on 2023-02-16 by the reprex package (v2.0.1)

Gilean0709
  • 1,098
  • 6
  • 17