-1

I am looking for help with my code. I have a dataset where many people were asked to score 5 different scenerios with a range of -5 to +5.

I then grouped the 2 groups to S and A as I am looking to compare these 2 groups. The data set is below:

Score<-c(-2, 3, 4, -1, 3, 4, 5, -1, 3, 5, -3, 3, 5, 1, -4, 5, -2, 
         1, 3, 4, -4, 2, -1, 3, 4)

Group<-c( "S", "S", "A", "S", "A", "S", "A", "S", "S", "A", "S", "A", "S", "A", 
          "S", "S", "A", "S", "A", "S", "A", "S", "S", "A", "S"
           )

Scenerio_ID <-c(1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 
                1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5)


CombinedTable<-data.frame(Score,Group, Scenerio_ID )

I am trying to perform the following:

  1. Create a new column or table with the mean score for each Scenerio_ID and the 95% Confidence Intervals (Upper and Lower) for both group "A" and group "S".

  2. I am trying to draw a graph where I have Score on the Y axis and Scenario on the x axis and for each scenerio, Group A and Group S. I want the mean point and the Upper and Lower 95% Confidence Intervals. Very similar to the picture I have just attached.

enter image description here

My code attempt at calculating the Mean and Confidence Interval for each Scenerio with respect to Group is this:

library(dplyr)
library(Rmisc)
library(ggplot2)

MeansCombinedTable <- 
  CombinedTable %>%
  group_by(Group, Scenerio_ID) %>%
  dplyr::summarise(avg_Score = mean(Score), 
                   uci_Score = CI(Score)[1], 
                   lci_Score = CI(Score)[3]) %>%
  mutate(Scenerio_ID = Scenerio_ID %>% as.factor())

My attempt at drawing the plot is this:

CombinedTable %>%
  ggplot(aes(x = Group, y = avg_Score, fill = Scenerio_ID)) +
  geom_bar(stat = "identity", position = "dodge") +
  geom_errorbar(aes(ymin = lci_Score, ymax = uci_Score), position = "dodge")

Would be incredibly grateful for any help with this as I have been working on this for a while and it is giving me a huge problem. Many thanks in advance.

1 Answers1

0

You are summarising the data and then using the original data set to plot. That's the main problem.

suppressPackageStartupMessages({
  library(dplyr)
  library(Rmisc)
  library(ggplot2)
})

MeansCombinedTable <- 
  CombinedTable %>%
  group_by(Group, Scenerio_ID) %>%
  dplyr::summarise(avg_Score = mean(Score), 
                   uci_Score = CI(Score)[1], 
                   lci_Score = CI(Score)[3],
                   .groups = "drop") %>%
  mutate(Scenerio_ID = paste("Scenerio", Scenerio_ID))
#> Warning: There were 4 warnings in `dplyr::summarise()`.
#> The first warning was:
#> ℹ In argument: `uci_Score = CI(Score)[1]`.
#> ℹ In group 1: `Group = "A"`, `Scenerio_ID = 1`.
#> Caused by warning in `qt()`:
#> ! NaNs produced
#> ℹ Run `dplyr::last_dplyr_warnings()` to see the 3 remaining warnings.

MeansCombinedTable %>%
  ggplot(aes(x = Group, y = avg_Score, fill = Scenerio_ID)) +
  geom_bar(stat = "identity", position = "dodge") +
  geom_errorbar(aes(ymin = lci_Score, ymax = uci_Score), position = "dodge") +
  labs(x = "", y = "Score") +
  facet_wrap(~ Scenerio_ID, nrow = 1L, strip.position = "bottom") +
  theme_classic()

Created on 2023-02-20 with reprex v2.0.2


Edit

In order to choose only a few scenerios, create a vector of wanted_scenerio first, this will make the code more flexible. I slightly changed the summary code to only paste the string "Scenerio" and the variable Scenerio_ID after filtering the data.
To plot means as dots just substitute geom_point for geom_bar and don't map fill to Scenerio_ID.

suppressPackageStartupMessages({
  library(dplyr)
  library(Rmisc)
  library(ggplot2)
})

wanted_scenerios <- c(1, 2, 4)

MeansCombinedTable <- 
  CombinedTable %>%
  group_by(Group, Scenerio_ID) %>%
  dplyr::summarise(avg_Score = mean(Score), 
                   uci_Score = CI(Score)[1], 
                   lci_Score = CI(Score)[3],
                   .groups = "drop") 
#> Warning: There were 4 warnings in `dplyr::summarise()`.
#> The first warning was:
#> ℹ In argument: `uci_Score = CI(Score)[1]`.
#> ℹ In group 1: `Group = "A"`, `Scenerio_ID = 1`.
#> Caused by warning in `qt()`:
#> ! NaNs produced
#> ℹ Run `dplyr::last_dplyr_warnings()` to see the 3 remaining warnings.
  
MeansCombinedTable %>%
  filter(Scenerio_ID %in% wanted_scenerios) %>%
  mutate(Scenerio_ID = paste("Scenerio", Scenerio_ID)) %>%
  ggplot(aes(x = Group, y = avg_Score)) +
  geom_point(size = 2) +
  geom_errorbar(aes(ymin = lci_Score, ymax = uci_Score), position = "dodge") +
  labs(x = "", y = "Score") +
  facet_wrap(~ Scenerio_ID, nrow = 1L, strip.position = "bottom") +
  theme_classic()

Created on 2023-02-21 with reprex v2.0.2

Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
  • Thanks so much for this. For some reason, when I tried to run the code it says "Error : unexpected '=' in : .groups= "drop") %>% mutate(Scenerio_ID =" – JamesLancaster Feb 21 '23 at 05:08
  • @JamesLancaster I cannot reproduce the error. Are you getting the error with the posted data? – Rui Barradas Feb 21 '23 at 07:17
  • Sorry I've now made it work. Do you know how to make a dot point for where the mean is on each bit rather than the color as I don't need a legend. Similarly how do I split the graph so I only see 3 scenerios? – JamesLancaster Feb 21 '23 at 14:53