0

I own a dataset that contains marks for 3 tests. The first test has done before the experiment. Second and third has done after the experiment. I want to say since this experiment students marks have been improving, in a graphical way. I selected a boxplot for this. Using that I am going to say that maximum and minimum values in each test and their improvements after the experiment. Is that a good way?

Dinuka
  • 37
  • 1
  • 9
  • 1
    Are the 30 observations from 30 students who completed each of the 3 tests, or are there 90 students in 3 separate groups of 30? – Edward Apr 16 '20 at 08:23
  • 30 observations from 30 students who completed each of the 3 tests. – Dinuka Apr 16 '20 at 08:27
  • Then a boxplot is not a good graphic. It ignores the fact that the same students completed 3 tests. – Edward Apr 16 '20 at 08:28
  • What if I draw a boxplot for each test. Then the total is 3 boxplots in the same graph. Or else if I consider mean value for each test marks, then I can say improvements of mean marks for each test. – Dinuka Apr 16 '20 at 08:31
  • Don't use a boxplot. Show a line plot of all students which shows the change over time for each student. You can add the mean if you like. But the change for each student is more important. – Edward Apr 16 '20 at 08:41
  • Thanks for the answer – Dinuka Apr 16 '20 at 16:09

2 Answers2

2

Your data is longitudinal. Therefore, it is better to show the individual changes over time.

Multiple boxplots ignore the individual changes over time and treat each time point as a separate and unconnected group. Longitudinal line plots can show more information in the data.

Consider the following simulated data.

set.seed(1)
x1 <- rnorm(30, mean=50, sd=20)
x2 <- x1+rnorm(30, mean=5, sd=10)
x3 <- x2+rnorm(30, mean=5, sd=5)

data <- data.frame(x1, x2, x3)

library(tidyverse)

data %>%
  mutate(id=row_number()) %>%
  pivot_longer(-id, names_prefix="x", names_to="time") %>%
  ggplot(aes(y=value, x=time, group=id)) +
  geom_point() +
  geom_line() +
  stat_summary(aes(group=1), fun=mean, geom="line",lwd=2, col=2)

data %>%
  pivot_longer(everything(), names_prefix="x", names_to="time") %>%
  ggplot(aes(y=value, x=time))+
  geom_boxplot()

enter image description here

Those who scored poorly in the first test continued to do poorly in the second and third tests, something that the boxplot has missed.

Edward
  • 10,360
  • 2
  • 11
  • 26
1

You can use a Boxplot to see if the students as a group have improve. But imagine the good and students improve a lot, the moderate students get worse and the bad students improve. The Boxplot will show that the students in genrell improved, but you'll miss the information about the moderate students which actually got worse. For this, you can use a parallel coordinate plot. There is an implementation in the GGally package. For 30 observation this is still pretty well-arranged.

bonedi
  • 11
  • 1