1

My data looks like this:

mydata <- data.frame(ID = c(1, 2, 3, 5, 6, 7, 9, 11, 12, 13),          #patient ID
                    t1 = c(37, 66, 28, 60, 44, 24, 47, 44, 33, 47),    #evaluation before
                    t4 = c(33, 45, 27, 39, 24, 29, 24, 37, 27, 42),    #evaluation after
                    sexe = c(1, 2, 2, 1, 1, 1, 2, 2, 2, 1))            #subset

I would like to do a simple before-after graph.

So far, I managed to get this: ugly before-after plot

With this:

library(ggplot2)
ggplot(mydata) + 
  geom_segment(aes(x = 1, xend = 2, y = t1, yend = t4), size=0.6) +
  scale_x_discrete(name = "Intervention", breaks = c("1", "2"), labels = c("T1", "T4"), limits = c(1, 2)) +
  scale_y_continuous(name = "Var") + theme_bw()

I am facing multiple issues, can you help me to...

  • add black circle at the begining and the end of every line? (geom_point() doesn't work)
  • make line smoother (look how pixelated they are, especially the second one)?
  • decrease blank space on left and right side of the graph?
  • add median for T1 and T4 (in red), link those points, compare them with paired mann whitney test and print p-value on the graph?

I would like not to reformat my database to long format I have a lot of other variable and timepoint (not shown here). I have read other posts (such as here) but solution provided look so complicated for something that seems simple (yet i can't do it...). Huge thanks for your help!

I will update the graph along with progression :)

EDIT

I would like not to reformat my database to long format as I have a lot of other variables and timepoints (not shown here)...

B_slash_
  • 309
  • 2
  • 17
  • 2
    The pixellation probably is due to the RStudio device on Windows machines. Probably if you use `ggsave()`, it is antialiased properly. – teunbrand May 06 '20 at 11:20

2 Answers2

5

Here what i would do! Please feel free to ask questions regarding what's going on here.

library(tidyverse)

mydata <- data.frame(ID = c(1, 2, 3, 5, 6, 7, 9, 11, 12, 13),          #patient ID
                     t1 = c(37, 66, 28, 60, 44, 24, 47, 44, 33, 47),    #evaluation before
                     t4 = c(33, 45, 27, 39, 24, 29, 24, 37, 27, 42),    #evaluation after
                     sexe = c(1, 2, 2, 1, 1, 1, 2, 2, 2, 1))      

pval <- wilcox.test(x = mydata$t1,y = mydata$t4, paired = T,exact = F)$p.value %>% round(2)

df <- mydata %>% 
  pivot_longer(2:3,names_to = "Time") %>% # Pivot into long-format
  mutate(sexe = as.factor(sexe),
         Time = as.factor(Time)) # Make factors 

ggplot(df,aes(Time,value,color = sexe,group = ID)) + 
  geom_point() + 
  geom_line() + 
  stat_summary(inherit.aes = F,aes(Time,value),
    geom = "point", fun = "median", col = "red", 
    size = 3, shape = 24,fill = "red"
  ) +
  annotate("text", x = 1.7, y = 60, label = paste('P-Value is',pval)) + 
  coord_cartesian(xlim = c(1.4,1.6)) +
  theme_bw()

Also be aware that it is common to have some variables which repeat through time, in addition to the long format data. See example here:

mydata <- data.frame(ID = c(1, 2, 3, 5, 6, 7, 9, 11, 12, 13),          #patient ID
                     t1 = c(37, 66, 28, 60, 44, 24, 47, 44, 33, 47),    #evaluation before
                     t4 = c(33, 45, 27, 39, 24, 29, 24, 37, 27, 42),    #evaluation after
                     sexe = c(1, 2, 2, 1, 1, 1, 2, 2, 2, 1),
                     var1 = c(1:10),
                     var2 = c(1:10),
                     var3 = c(1:10))


df <- mydata %>% 
  pivot_longer(2:3,names_to = "Time") %>% # Pivot into long-format
  mutate(sexe = as.factor(sexe),
         Time = as.factor(Time))

enter image description here

Magnus Nordmo
  • 923
  • 7
  • 10
  • 1
    Nice solution :) Consider adding few explanations, even without questions from the asker, so it would be easier to understand it for future viewers. – RaV May 06 '20 at 09:29
  • Thanks for providing this solution. My problem is that i don't want to reformat my database to long format (here is a shortened version of my dataframe but i have a lot of data and mesurement) : i would prefer not to have to duplicate every line... is there another solution? – B_slash_ May 06 '20 at 12:01
  • 1
    Hmm. I dont know if there is a good solution if you want the data to be in the wide format. Please be aware that there `dplyr::pivot` can handle very large datasets. If you are working with databases you can use `dbplyr`. If you have a lot of individual datasets then you can create a functions which does the work with a single function. – Magnus Nordmo May 06 '20 at 12:49
  • 2
    See my edit. Long format data is probably more flexible than you think! – Magnus Nordmo May 06 '20 at 15:29
1

I can address (1) black circles issue:

First, you should tidy your data, so one column holds information of one variable (now 'Var' values on the plot are stored in two columns: 't1' and 't4'). You can achive this with tidyr package.

library(tidyr)
mydata_long <- pivot_longer(mydata, c(t1, t4), names_to = "t")

Now creating points is easy, and the rest of the code becomes a lot clearer: We can tell ggplot that we want 't' groups on x-axis, their values on y-axis and in case of lines, we want them separate for every 'ID'.

ggplot(mydata_long) +
  geom_line(aes(x = t, y = value, group = ID)) + #ploting lines
  geom_point(aes(x = t, y = value)) + #ploting points
  labs(x = "Intervention", y = "Var") + #changing labels
  theme_bw()

Result plot

RaV
  • 617
  • 1
  • 5
  • 11
  • Thanks for providing this solution. My problem is that i don't want to reformat my database to long format (here is a shortened version of my dataframe but i have a lot of data and mesurement) : i would prefer not to have to duplicate every line... is there another solution? – B_slash_ May 06 '20 at 12:01