How to get the perfect "Before-After" graph with connected dots and paired U test using ggplot2?

Question

My data looks like this:

mydata <- data.frame(ID = c(1, 2, 3, 5, 6, 7, 9, 11, 12, 13),          #patient ID
                    t1 = c(37, 66, 28, 60, 44, 24, 47, 44, 33, 47),    #evaluation before
                    t4 = c(33, 45, 27, 39, 24, 29, 24, 37, 27, 42),    #evaluation after
                    sexe = c(1, 2, 2, 1, 1, 1, 2, 2, 2, 1))            #subset

I would like to do a simple before-after graph.

So far, I managed to get this:

With this:

library(ggplot2)
ggplot(mydata) + 
  geom_segment(aes(x = 1, xend = 2, y = t1, yend = t4), size=0.6) +
  scale_x_discrete(name = "Intervention", breaks = c("1", "2"), labels = c("T1", "T4"), limits = c(1, 2)) +
  scale_y_continuous(name = "Var") + theme_bw()

I am facing multiple issues, can you help me to...

add black circle at the begining and the end of every line? (geom_point() doesn't work)
make line smoother (look how pixelated they are, especially the second one)?
decrease blank space on left and right side of the graph?
add median for T1 and T4 (in red), link those points, compare them with paired mann whitney test and print p-value on the graph?

I would like not to reformat my database to long format I have a lot of other variable and timepoint (not shown here). I have read other posts (such as here) but solution provided look so complicated for something that seems simple (yet i can't do it...). Huge thanks for your help!

I will update the graph along with progression :)

EDIT

I would like not to reformat my database to long format as I have a lot of other variables and timepoints (not shown here)...

The pixellation probably is due to the RStudio device on Windows machines. Probably if you use `ggsave()`, it is antialiased properly. — teunbrand, May 06 '20 at 11:20

Magnus Nordmo · Accepted Answer · 2020-05-06T15:28:47.243

Here what i would do! Please feel free to ask questions regarding what's going on here.

library(tidyverse)

mydata <- data.frame(ID = c(1, 2, 3, 5, 6, 7, 9, 11, 12, 13),          #patient ID
                     t1 = c(37, 66, 28, 60, 44, 24, 47, 44, 33, 47),    #evaluation before
                     t4 = c(33, 45, 27, 39, 24, 29, 24, 37, 27, 42),    #evaluation after
                     sexe = c(1, 2, 2, 1, 1, 1, 2, 2, 2, 1))      

pval <- wilcox.test(x = mydata$t1,y = mydata$t4, paired = T,exact = F)$p.value %>% round(2)

df <- mydata %>% 
  pivot_longer(2:3,names_to = "Time") %>% # Pivot into long-format
  mutate(sexe = as.factor(sexe),
         Time = as.factor(Time)) # Make factors 

ggplot(df,aes(Time,value,color = sexe,group = ID)) + 
  geom_point() + 
  geom_line() + 
  stat_summary(inherit.aes = F,aes(Time,value),
    geom = "point", fun = "median", col = "red", 
    size = 3, shape = 24,fill = "red"
  ) +
  annotate("text", x = 1.7, y = 60, label = paste('P-Value is',pval)) + 
  coord_cartesian(xlim = c(1.4,1.6)) +
  theme_bw()

Also be aware that it is common to have some variables which repeat through time, in addition to the long format data. See example here:

mydata <- data.frame(ID = c(1, 2, 3, 5, 6, 7, 9, 11, 12, 13),          #patient ID
                     t1 = c(37, 66, 28, 60, 44, 24, 47, 44, 33, 47),    #evaluation before
                     t4 = c(33, 45, 27, 39, 24, 29, 24, 37, 27, 42),    #evaluation after
                     sexe = c(1, 2, 2, 1, 1, 1, 2, 2, 2, 1),
                     var1 = c(1:10),
                     var2 = c(1:10),
                     var3 = c(1:10))


df <- mydata %>% 
  pivot_longer(2:3,names_to = "Time") %>% # Pivot into long-format
  mutate(sexe = as.factor(sexe),
         Time = as.factor(Time))

Nice solution :) Consider adding few explanations, even without questions from the asker, so it would be easier to understand it for future viewers. — RaV, May 06 '20 at 09:29
Thanks for providing this solution. My problem is that i don't want to reformat my database to long format (here is a shortened version of my dataframe but i have a lot of data and mesurement) : i would prefer not to have to duplicate every line... is there another solution? — B_slash_, May 06 '20 at 12:01
Hmm. I dont know if there is a good solution if you want the data to be in the wide format. Please be aware that there `dplyr::pivot` can handle very large datasets. If you are working with databases you can use `dbplyr`. If you have a lot of individual datasets then you can create a functions which does the work with a single function. — Magnus Nordmo, May 06 '20 at 12:49
See my edit. Long format data is probably more flexible than you think! — Magnus Nordmo, May 06 '20 at 15:29

RaV · Answer 2 · 2020-05-06T09:30:06.517

I can address (1) black circles issue:

First, you should tidy your data, so one column holds information of one variable (now 'Var' values on the plot are stored in two columns: 't1' and 't4'). You can achive this with tidyr package.

library(tidyr)
mydata_long <- pivot_longer(mydata, c(t1, t4), names_to = "t")

Now creating points is easy, and the rest of the code becomes a lot clearer: We can tell ggplot that we want 't' groups on x-axis, their values on y-axis and in case of lines, we want them separate for every 'ID'.

ggplot(mydata_long) +
  geom_line(aes(x = t, y = value, group = ID)) + #ploting lines
  geom_point(aes(x = t, y = value)) + #ploting points
  labs(x = "Intervention", y = "Var") + #changing labels
  theme_bw()

Thanks for providing this solution. My problem is that i don't want to reformat my database to long format (here is a shortened version of my dataframe but i have a lot of data and mesurement) : i would prefer not to have to duplicate every line... is there another solution? — B_slash_, May 06 '20 at 12:01

How to get the perfect "Before-After" graph with connected dots and paired U test using ggplot2?

2 Answers2