0

I am trying to plot a bar graph in R with 4 independent variables - time(t1,t2), group(1,2,3,4,5), distance(far and near) and cue(valid and invalid) with RT as the dependent variable. For the same, I have used the following code

ggplot(b, aes(x=cue, y=RT, fill = cue))+
  geom_bar(stat="identity", position = position_dodge(),  width = .9)+
  facet_grid(group~time,  space="free_x") +
  geom_errorbar(aes(ymin= RT-se, ymax = RT+se), width = 0.2, color = "BLACK", position=position_dodge())+
  coord_cartesian(ylim = c(200,1500))+theme(legend.title = element_blank())

When running the codes in R, I am getting the following graph

Plot here - bar plot

Is it possible to rearrange cue (valid/invalid as well as distance (near/far) in a descending manner (both to be done together).

The error bars seem to be off centre, how do I fix it? Also, can I statistically compare two items (for example, comparing valid and invalid under group 1, time1) and denote them in the graph?


The data set looks something like this for each participant:

participant cue distance RT time group
P1 valid far 1461 T1 4
P1 invalid near 1416 T1 4
P1 invalid near 1409 T1 4
P1 invalid far 1351 T1 4

#------ Updated query

I have updated the plot as shown here new plot. The error bars seem to be too small to see. Why is that?

I want to compare valid and invalid variables for each category. That is, compare valid and invalid for near and far categories for each group.

This is the codes that I have used so far:

summarySE <- function(data=NULL, measurevar, groupvars=NULL, na.rm=FALSE,
                      conf.interval=.95, .drop=TRUE) {
  
  
  # New version of length which can handle NA's: if na.rm==T, don't count them 
  length2 <- function (x, na.rm=FALSE) {
    if (na.rm) sum(!is.na(x))
    else       length(x)
  }
  
  # This does the summary. For each group's data frame, return a vector with
  # N, mean, and sd
  datac <- ddply(data, groupvars, .drop=.drop,
                 .fun = function(xx, col) {
                   c(N    = length2(xx[[col]], na.rm=na.rm),
                     mean = mean   (xx[[col]], na.rm=na.rm),
                     sd   = sd     (xx[[col]], na.rm=na.rm)
                   )
                 },
                 measurevar
  )
  
  # Rename the "mean" column    
  datac <- plyr::rename(datac, c("mean" = measurevar))
  
  datac$se <- datac$sd / sqrt(datac$N)  # Calculate standard error of the mean
  
  # Confidence interval multiplier for standard error
  # Calculate t-statistic for confidence interval: 
  # e.g., if conf.interval is .95, use .975 (above/below), and use df=N-1
  ciMult <- qt(conf.interval/2 + .5, datac$N-1)
  datac$ci <- datac$se * ciMult
  
  return(datac)
}

data<- read.table("trialdata.csv", header=TRUE, sep=",")

b<- summarySE(data, measurevar="RT", groupvars=c("cue", "distance", "time", "group"))


b %>% 
  mutate(cue = fct_rev(cue)) %>% mutate(distance = fct_rev(distance))%>%
ggplot( aes(x=distance, y=RT, fill = cue))+
  geom_bar(stat="identity", position = "dodge", width = 0.5)+
  facet_grid(group~time,  space="free_x") +
  geom_errorbar(aes(ymin= RT - se, ymax = RT + se), width = 0.08, color = "BLACK", position = position_dodge(0.5))+
    scale_fill_manual(values = c( "grey",  "dimgrey" ), 
                    labels = c("valid", "invalid"))

What more should I do to include the statistical comparisons?

Christina
  • 9
  • 7
  • 1
    Can you post your original data? I've made a dummy dataset, but can't reproduce your problem. Maybe it's a data cleaning issue. – masher Jan 27 '21 at 06:09
  • The group number is all the same? – masher Jan 27 '21 at 07:14
  • this data is just one participant. there are around 20 participant and each participant has around 360 trials (180 in t1, 180 in t2) – Christina Jan 27 '21 at 07:15
  • Now you've got 9 entries with identical participant, cue, distance, time, and group! did you want to plot an average of them with a standard deviation? – masher Jan 27 '21 at 07:20
  • @masher what can be done to incorporate the statistics? – Christina Jan 27 '21 at 10:37
  • The errors bars are that small because they are that small. They represent the 95%CI of the mean. What other statistics do you want to incorporate? – masher Jan 28 '21 at 01:22

1 Answers1

0

#--------------------------

Answer after question edits.

Error bar alignment is done by the call to position_dodge. Invalid/valid reordering is a call to fct_rev. Statistical comparison depends on what you actually want to show, and then try to figure out how you want to show it.

library(tidyverse)
b <- tribble(
  ~participant, ~cue, ~distance, ~RT, ~time, ~group,
  "P1", "valid",    "far",  1461,   "T1",   4,
  "P1", "invalid",  "near", 1416,   "T1",   4,
  "P1", "invalid",  "near", 1409,   "T1",   4,
  "P1", "invalid",  "far",  1351,   "T1",   4,
  "P1", "invalid",  "far",  1391,   "T1",   4,
  "P1", "invalid",  "far",  1365,   "T1",   4,
  "P1", "invalid",  "far",  1385,   "T1",   4,
  "P1", "invalid",  "near", 1465,   "T1",   4,
  "P1", "valid",    "near", 1451,   "T1",   4,
  "P1", "valid",    "near", 1397,   "T1",   4,
  "P1", "valid",    "far",  1466,   "T1",   4,
  "P1", "invalid",  "far",  1411,   "T1",   4,
  "P1", "invalid",  "near", 1439,   "T1",   4,
  "P1", "valid",    "far",  1328,   "T1",   4,
  "P1", "valid",    "far",  1437,   "T1",   4,
  "P1", "valid",    "far",  1376,   "T1",   4,
  "P1", "invalid",  "far",  1364,   "T1",   4,
  "P1", "invalid",  "near", 1451,   "T1",   4,
  "P1", "valid",    "far",  1461,   "T1",   4,
  "P1", "invalid",  "far",  1441,   "T1",   4,
  "P1", "valid",    "near", 1491,   "T1",   4,
  "P1", "valid",    "near", 1385,   "T1",   4,
  "P1", "valid",    "near", 1553,   "T1",   4,
  "P1", "invalid",  "far",  1484,   "T1",   4,
  "P1", "valid",    "far",  1449,   "T1",   4,
  "P1", "invalid",  "near", 1361,   "T1",   4,
  "P1", "invalid",  "near", 1399,   "T1",   4,
  "P1", "invalid",  "near", 1389,   "T1",   4,
  "P1", "valid",    "near", 1378,   "T1",   4,
  "P1", "valid",    "near", 1365,   "T1",   4,
  "P1", "valid",    "far",  1465,   "T1",   4,
  "P1", "valid",    "near", 1333,   "T1",   4,
  "P1", "valid",    "near", 1340,   "T1",   4,
  "P1", "invalid",  "far",  1347,   "T1",   4,
  "P1", "valid",    "far",  1375,   "T1",   4,
  "P1", "valid",    "near",  390,   "T2",   4,
  "P1", "invalid",  "far",   394,   "T2",   4,
  "P1", "invalid",  "near",  374,   "T2",   4,
  "P1", "valid",    "far",   363,   "T2",   4,
  "P1", "valid",    "near",  342,   "T2",   4,
  "P1", "invalid",  "far",   421,   "T2",   4,
  "P1", "invalid",  "near",  398,   "T2",   4,
  "P1", "invalid",  "near",  419,   "T2",   4
)


b %>% 
  group_by(participant, cue, distance, time, group) %>% 
  summarise(RT_mean = mean(RT), 
            RT_sd = sd(RT)) %>% 
  filter(participant == "P1") %>% #not strictly necessary in this instance, but will be in general.
  mutate(cue = fct_rev(cue)) %>% 
ggplot(aes(x=cue, y=RT_mean, fill = distance))+
  geom_bar(stat="identity", position = position_dodge(),  width = .9)+
  facet_grid(group~time,  space="free_x") +
  geom_errorbar(aes(ymin= RT_mean - RT_sd, ymax = RT_mean + RT_sd), 
                width = 0.2, color = "BLACK", 
                position=position_dodge(0.9))+ #the 0.9 here should the same value as the width in geom_bar
                                              #  to keep the error bar centred.
  coord_cartesian(ylim = c(200,1500))+theme(legend.title = element_blank())
#> `summarise()` regrouping output by 'participant', 'cue', 'distance', 'time' (override with `.groups` argument)

Created on 2021-01-27 by the reprex package (v0.3.0)

#--------------------------

Original answer

I've made a dummy dataset, and think you may have a data quality issue. See that the first two rows of b have the same cue, time, and group, but different RT. Can you post your original data?

The ordering of "valid"/"invalid" can be reversed using the forcats package, as in my second example.

library(tidyverse)
cue <- c("invalid",   "invalid","valid","invalid","valid","invalid","valid","invalid","valid")
time <- c("T1",           "T1","T1","T2","T2","T1","T1","T2","T2")
group <- c(1,                1,1,1,1,2,2,2,2)
RT <- c(1000,             1200,1300,400,500,700,800,300,400)
ci <- c(50,                100,100,100,100,50,50,50,50)

b <- tibble(cue,time,group,RT,ci)

b
#> # A tibble: 9 x 5
#>   cue     time  group    RT    ci
#>   <chr>   <chr> <dbl> <dbl> <dbl>
#> 1 invalid T1        1  1000    50
#> 2 invalid T1        1  1200   100
#> 3 valid   T1        1  1300   100
#> 4 invalid T2        1   400   100
#> 5 valid   T2        1   500   100
#> 6 invalid T1        2   700    50
#> 7 valid   T1        2   800    50
#> 8 invalid T2        2   300    50
#> 9 valid   T2        2   400    50

ggplot(b,aes(x=cue, y=RT, fill = cue))+
  geom_bar(stat="identity", position = position_dodge(),  width = .9)+
  facet_grid(group~time,  space="free_x") +
  geom_errorbar(aes(ymin= RT - ci, ymax = RT + ci), width = 0.2, color = "BLACK", position=position_dodge())+
  coord_cartesian(ylim = c(200,1500))+theme(legend.title = element_blank())



#reverse the order of the "invalid"/"valid"
b %>% 
  mutate(cue = fct_rev(cue)) %>% 
ggplot(aes(x=cue, y=RT, fill = cue))+
  geom_bar(stat="identity", position = position_dodge(),  width = .9)+
  facet_grid(group~time,  space="free_x") +
  geom_errorbar(aes(ymin= RT - ci, ymax = RT + ci), width = 0.2, color = "BLACK", position=position_dodge())+
  coord_cartesian(ylim = c(200,1500))+theme(legend.title = element_blank())

Created on 2021-01-27 by the reprex package (v0.3.0)

masher
  • 3,814
  • 4
  • 31
  • 35
  • I am sorry, I completely missed a variable in the dataset. I need to add one more independent variable to this, distance - far and near. I will edit the question and add the details – Christina Jan 27 '21 at 06:44
  • That will be the issue. You'd need to filter for that before plotting. – masher Jan 27 '21 at 06:53
  • I have filtered the data and removed the outliers. How can I add the new variable? – Christina Jan 27 '21 at 07:07
  • I tried the mutate fn, but received an error. `Error: Problem with `mutate()` input `cue`. x could not find function "fct_rev" i Input `cue` is `fct_rev(cue)`.` – Christina Jan 27 '21 at 07:08
  • did you `library(tidyverse)` ? It looks like you haven't loaded the forcats library. – masher Jan 27 '21 at 07:15