Likert stacked bar chart in ggplot with pre and posttest

Question

I am brand new to R and a teacher, so thank you for your patience. I've searched many other questions on Likert stacked bar charts (this one is close, but not exactly what I am struggling with). I can't seem to find one that discusses how to pull the results of both survey pre- and post-test into the same stacked bar chart. I've read through Hayley's R for Data Science book, examples on GitHub, the R Companion Handbook, and the R Cookbook. Still really need some help, as a beginner.

I have a set of 12 student questions, each with a pre- and post-test response, on a scale of Strongly Agree to Strongly Disagree.

My question is: How did the student survey responses change before and after a test?

My data originally was displayed as:

Student sex(F=0,M=1)  PreTestQ1   PostTestQ1
1       0              Agree      Disagree
2       0              Disagree   Agree
3       1              Agree      Agree
4       1              Disagree   Agree

First, I converted the Agrees/Disagrees to numerical data (Strongly Agree = 1, Strongly Disagree = 4, no Neutral option) and tidied the data from wide to long using:

    # Set data frame as wide
msse_wide <- read_xls("ProcessDataMSSE.xls")
colnames(msse_wide) # Displays names of columns
head(msse_wide)


# Set data frame as long, after running wide code above
msse_long <- msse_wide %>%
  gather(question,obs_prepost, c(2:25)) # This pulls the columns from 2 to 25 (not including the "sex" column), test it out first as a precaution

# NOW MY DATA IS TIDY!!!! :)

And I got:

    > msse_long
# A tibble: 1,824 x 3
     sex question obs_prepost
   <dbl> <chr>          <dbl>
 1     0 1Pre               3
 2     0 1Pre               3
 3     0 1Pre               2
 4     0 1Pre               3
 5     0 1Pre               2
 6     0 1Pre               3
 7     0 1Pre               3
 8     0 1Pre               2
 9     0 1Pre               2
10     0 1Pre               4
# … with 1,814 more rows

Now I would like to visualize the percentages of Strongly Agree --- Strongly Disagree responses as a stacked bar chart, using percentage responses, AND comparing pre- and post-test as stacked bars one on top of the other (so, with 12 questions pre- and post-, I will have 24 total stacked bar charts).

The ultimate goal is similar to this example from R Companion: Simple Stacked Bar Chart ...except I am stuck on how to pull percentages out of my data, and compare pre- and post-tests one on top of the other.

that plot that you shared is very hard to read... the x-axis doesn't seem to have much to do with the percentages in each of the bars? Can you add the `dput(msse_long)` output to your question? and show any plotting code you have tried? — morgan121, Feb 27 '20 at 22:11
I would prefer having an x-axis that runs from 0% to 100%, but the basic form of the plot is what I am looking for. I am new to R so I do not know how to do plotting code yet. — Tara Lepore, Feb 27 '20 at 22:18

dario · Answer 1 · 2020-02-27T22:53:28.553

Something like this?:

Data:

msse_wide <- read.table(text='Student sex(F=0,M=1)  PreTestQ1   PostTestQ1
                              1       0              Agree      Disagree
                              2       0              Disagree   Agree
                              3       1              Agree      Agree
                              4       1              Disagree   Agree',
                        header=TRUE,
                        stringsAsFactors=FALSE)

Suggested solution using dplyr, tidyr ggplot2 and scales:

library(dplyr)
library(tidyr)
library(ggplot2)
library(scales)
msse_wide %>% 
  pivot_longer(cols = -c(Student, sex.F.0.M.1.),
               names_to = "Test") %>% 
  group_by(Test, value) %>% 
  summarise(N = n()) %>%
  mutate(Pct = N / sum(N)) %>% 
  ggplot(aes(Test, Pct, fill = value)) +
    geom_bar(stat="identity") +
    scale_y_continuous(labels = percent)

Edit:

Thanks to a comment from @dc37:

Adding

+ coord_flip()

to the above code gives:

Explanation:

Starting from the data in wide form we use pivot_longer , tidyrs successor of gather to get the desired structure.

Then we group by Test and value (the individual answer levels) and summarize by counting the cases in each group using dplyr's n function.

We then mutate (in this case, create) a column where we divide the count for each Test - value combination by the sum for each Test group (dplyr now only grouping by the first group, Test)

Finally we use ggplot2 to plot the data and scales to label the percent axis.

Thank you! This gets me closer to my goal, I wasn't aware of the pivot_longer function. — Tara Lepore, Feb 28 '20 at 17:41

score -1 · Answer 2 · answered Mar 25 '20 at 21:02

I've got something that seems to work, only the pre- and post-treatments are out of order. This is in base R. Thanks for your help!

if(!require(psych)){install.packages("psych")}
if(!require(likert)){install.packages("likert")}
library(readxl)
setwd("MSSE 507 Capstone Data Analysis/")
read_xls("ProcessDataMSSE.xls")


Data = read_xls("ProcessDataMSSE.xls")

str(Data) # tbl_df, tbl, and data.frame classes

### Change Likert scores to factor and specify levels; factors because numeric values are ordinal

Data <- Data[, c(3:26)] # Get rid of the other columns! (Drop multiple columns) 

Data$`1Pre` <- factor(Data$`1Pre`,
                   levels = c("1", "2", "3", "4"),
                   ordered = TRUE)

Data$`1Post` = factor(Data$`1Post`,
                     levels = c("1", "2", "3", "4"),
                     ordered = TRUE)

Data$`2Pre` <- factor(Data$`2Pre`,
                      levels = c("1", "2", "3", "4"),
                      ordered = TRUE)

Data$`2Post` = factor(Data$`2Post`,
                      levels = c("1", "2", "3", "4"),
                      ordered = TRUE)

Data$`3Pre` <- factor(Data$`3Pre`,
                      levels = c("1", "2", "3", "4"),
                      ordered = TRUE)

Data$`3Post` = factor(Data$`3Post`,
                      levels = c("1", "2", "3", "4"),
                      ordered = TRUE)

Data$`4Pre` <- factor(Data$`4Pre`,
                      levels = c("1", "2", "3", "4"),
                      ordered = TRUE)

Data$`4Post` = factor(Data$`4Post`,
                      levels = c("1", "2", "3", "4"),
                      ordered = TRUE)

Data$`5Pre` <- factor(Data$`5Pre`,
                      levels = c("1", "2", "3", "4"),
                      ordered = TRUE)

Data$`5Post` = factor(Data$`5Post`,
                      levels = c("1", "2", "3", "4"),
                      ordered = TRUE)

Data$`6Pre` <- factor(Data$`6Pre`,
                      levels = c("1", "2", "3", "4"),
                      ordered = TRUE)

Data$`6Post` = factor(Data$`6Post`,
                      levels = c("1", "2", "3", "4"),
                      ordered = TRUE)

Data$`7Pre` <- factor(Data$`7Pre`,
                      levels = c("1", "2", "3", "4"),
                      ordered = TRUE)

Data$`7Post` = factor(Data$`7Post`,
                      levels = c("1", "2", "3", "4"),
                      ordered = TRUE)

Data$`8Pre` <- factor(Data$`8Pre`,
                      levels = c("1", "2", "3", "4"),
                      ordered = TRUE)

Data$`8Post` = factor(Data$`8Post`,
                      levels = c("1", "2", "3", "4"),
                      ordered = TRUE)

Data$`9Pre` <- factor(Data$`9Pre`,
                      levels = c("1", "2", "3", "4"),
                      ordered = TRUE)

Data$`9Post` = factor(Data$`9Post`,
                      levels = c("1", "2", "3", "4"),
                      ordered = TRUE)

Data$`10Pre` <- factor(Data$`10Pre`,
                      levels = c("1", "2", "3", "4"),
                      ordered = TRUE)

Data$`10Post` = factor(Data$`10Post`,
                      levels = c("1", "2", "3", "4"),
                      ordered = TRUE)

Data$`11Pre` <- factor(Data$`11Pre`,
                      levels = c("1", "2", "3", "4"),
                      ordered = TRUE)

Data$`11Post` = factor(Data$`11Post`,
                      levels = c("1", "2", "3", "4"),
                      ordered = TRUE)

Data$`12Pre` <- factor(Data$`12Pre`,
                      levels = c("1", "2", "3", "4"),
                      ordered = TRUE)

Data$`12Post` = factor(Data$`12Post`,
                      levels = c("1", "2", "3", "4"),
                      ordered = TRUE)

Data <- factor(Data,levels=Data[3:26])
Data
### Double check the data frame

library(psych) # Loads psych package

headTail(Data) # Displays last few and first few data

str(Data) # Shows structure of an object (observations and variables, etc.) - in this case, ordinal factors with 4 levels (1 through 4)

summary(Data) # Summary of the number of times you see a data point

Data$`1Pre` # This allows us to check how many data points are really there

str(Data)
### Remove unnecessary objects, removing the data frame in this case (we've converted that data frame into a table with the read.table function above)

library(likert)

Data <- as.data.frame(Data) # Makes the tibble a data frame

likert(Data) # This will give the percentage responses for each level and group

Result = likert(Data)

summary(Result) # This will give the mean and SD 


plot(Result,
     main = "Pre and Post Treatment Percentage Responses",
     ylab="Questions",
     type="bar")

Likert stacked bar chart in ggplot with pre and posttest

2 Answers2

Edit: