Reshaping model output into ggplot friendly format

Question

Consider the following data frame:

set.seed(123)
dat1 <- data.frame(Loc = rep(c("a","b","c","d","e","f","g","h"),each = 5),
                   ID = rep(c(1:10), each = 2),
                   var1 = rnorm(200),
                   var2 = rnorm(200),
                   var3 = rnorm(200),
                   var4 = rnorm(200),
                   var5 = rnorm(200),
                   var6 = rnorm(200))
dat1$ID <- factor(dat1$ID)

I am using the RVAideMemoire package to perform permutation ANOVAs.

library(RVAideMemoire)
perm <- multtest.gp(dat1[,3:8], dat1[,1], test = "perm")

The output provides access to the mean and SE for each Loc through the list element tab:

a <- perm$tab

I would like to plot the mean for each group (geom_point) +/- the standard error, and facet them by var. What is the simplest way that I can get a into a ggplot friendly format to make this graph, and use the original labels for the plots using the original Locs names from dat1 (the columns are labeled mean.x, and SE.n)?

Adam Quek · Answer 1 · 2020-06-27T01:41:27.693

# write the rownames of perm$tab as a variable ID in the data.frame. 
# Note: this is not the same ID as the one in the original data.frame (dat1).
df <- data.frame(ID = row.names(a), a) 

# To have a usable df to plot mean +/- se, you would need to have a data.frame in the format of:
#     ID     Loc           Mean        SE         Min         Max
#   var1       a     mean_value  se_value    mean - se  mean + se
#   ....

# there are various ways to form this. I'm using the older method of splitting the single data.frame into two sub-frame and then merge into one.

# the first sub-frame takes only the mean values and ignore the se values

# turn all the columns starting with Mean are collapse to a long format
df1 <- df %>% gather(Loc, mean, Mean.a:Mean.h) %>% select(ID, Loc, mean) 
# note the suggested approach is using pivot_longer, but gather is not going to be defunct anytime soon...     
df1 %>% head
#     ID    Loc       mean
# 1 var1 Mean.a  0.1782400
# 2 var2 Mean.a  0.1755200
# 3 var3 Mean.a  0.0097919
# 4 var4 Mean.a -0.1796800
# 5 var5 Mean.a  0.3598900
# 6 var6 Mean.a -0.2262200

df1 <- df1 %>% mutate(Loc = gsub("Mean.", "", Loc)) # to remove prefix "Mean."

 # the second sub-frame takes only the se values and ignore the mean values

df2 <- df %>% gather(Loc, se, SE.a:SE.h) %>% select(ID, Loc, se)
df2 <- df2 %>% mutate(Loc = gsub("SE.", "", Loc))
df2 %>% head

#     ID var      se
# 1 var1   a 0.22419
# 2 var2   a 0.18539   
# 3 var3   a 0.16239
# 4 var4   a 0.17894
# 5 var5   a 0.17129
# 6 var6   a 0.18997

# combine the two sub-data into usable format for plotting
plot.dat <- df1 %>% left_join(df2)

# generate the min and max variable. I'm using 1.96 x se in this case.   
plot.dat <- plot.dat %>% mutate(min = mean - 1.96*se, max = mean + 1.96*se)

plot.dat %>% head

#    ID Loc       mean      se        min       max
# 1 var1   a  0.1782400 0.22419 -0.2611724 0.6176524
# 2 var2   a  0.1755200 0.18539 -0.1878444 0.5388844
# 3 var3   a  0.0097919 0.16239 -0.3084925 0.3280763
# 4 var4   a -0.1796800 0.17894 -0.5304024 0.1710424
# 5 var5   a  0.3598900 0.17129  0.0241616 0.6956184
# 6 var6   a -0.2262200 0.18997 -0.5985612 0.1461212


# standard plotting commands with facet-Wrap
ggplot(plot.dat, aes(Loc, mean)) + 
      geom_point() + 
      geom_errorbar(aes(ymin = min, ymax=max)) + 
      facet_wrap(~ID)

thanks for the help, that worked. I am trying to understand what you did, and had 2 questions: I take it `var` is a call that dplyr knows is supposed to mean "columns", or something similar, is this correct? Also, in my example, the `ID`s are single letters to begin with. when I try this whole procedure on my real data, where the `ID`s have longer names, `multtest.gp` shortens the names to single or double letters (after `mean.` and `SE.`, if that makes sense. How can I get the full names back in that situation? — Ryan, Jun 26 '20 at 12:44
@Ryan I'd annotated the logic and change the `var` to `Loc` to avoid confusion. I do not understand your second question though. — Adam Quek, Jun 27 '20 at 01:42

Reshaping model output into ggplot friendly format

1 Answers1