0

I'm working on my MSc. thesis but I'm having trouble getting what I want to see and analyze on R.

I have a data frame like this:

Subject_ID   Type     Speed1      Speed2     Speed3    ...    Speed20
    1          A        25          27         24               31
    2          B        32          21         35               33
    3          B        21          25         27               29
    4          A        31          28         38               20
    5          A        30          22         21               28
    6          B        27          33         31               24

Coming from an economic game programmed on z-Tree. I'm reading data using R. In the game, subjects choose their speed in each individual period, and there are 20 periods. I want to find behavioral differences between subjects of different types.

I want to characterize the behaviour of each type, using several subjects for each type. For example, Type A subjects use higher (on average) speeds but with a high variance. On the other hand, Type B subjects maybe have a lower average speed in periods but with lower variance.

Hopefully I want to see statistic information grouping by Type. Also, I would love to see a graph like:

Each line is for showing one Type. X axis is time and Y axis is speed

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453

1 Answers1

1

This could be a good point to start. If you want to compare both types you should analyze how speed evolves across all subjects and then make a decision. You data is in wide format so in order to use ggplot2 functions you first have to format to longer using pivot_longer() from tidyr in tidyverse. After that, it will be possible to design the plot. Here a code to produce a plot which is splitted by Type and using color lines by speed. The included plot has ribbons but in order to add them to the plot is necessary to have limits variables defined in your dataset. Next the solution:

library(tidyverse)
#Code
df %>% pivot_longer(cols = -c(Subject_ID,Type)) %>%
  rename(Speed=name) %>%
  mutate(Speed=factor(Speed,levels = unique(Speed))) %>%
  ggplot(aes(x=factor(Subject_ID),y=value,color=Speed,group=Speed))+
  geom_point()+
  geom_line(size=1)+
  theme_bw()+
  facet_wrap(.~Type,scales='free')+
  xlab('Subject')

Output:

enter image description here

Some data used:

#Data
df <- structure(list(Subject_ID = 1:6, Type = c("A", "B", "B", "A", 
"A", "B"), Speed1 = c(25L, 32L, 21L, 31L, 30L, 27L), Speed2 = c(27L, 
21L, 25L, 28L, 22L, 33L), Speed3 = c(24L, 35L, 27L, 38L, 21L, 
31L), Speed20 = c(31L, 33L, 29L, 20L, 28L, 24L)), class = "data.frame", row.names = c(NA, 
-6L))

If you do not want the plot splitted by Type you can avoid that code line for facets an obtain this:

#Code 2
df %>% pivot_longer(cols = -c(Subject_ID,Type)) %>%
  rename(Speed=name) %>%
  mutate(Speed=factor(Speed,levels = unique(Speed))) %>%
  ggplot(aes(x=factor(Subject_ID),y=value,color=Speed,group=Speed))+
  geom_point()+
  geom_line(size=1)+
  theme_bw()+
  xlab('Subject')

Output:

enter image description here

Update: You can use group_by() and summarise() with sum() in order to aggregate all values by type with next code:

#Code 3
df %>% pivot_longer(cols = -c(Subject_ID,Type)) %>%
  rename(Speed=name) %>%
  group_by(Subject_ID,Type) %>%
  summarise(value=sum(value)) %>%
  ggplot(aes(x=factor(Subject_ID),y=value,color=Type,group=Type))+
  geom_point()+
  geom_line(size=1)+
  theme_bw()+
  xlab('Subject')

Output:

enter image description here

Computing mean and SD by group and re arranging the plot scheme will produce this:

#Code 4
df %>% pivot_longer(cols = -c(Subject_ID,Type)) %>%
  rename(Speed=name) %>%
  group_by(Subject_ID,Type) %>%
  summarise(Value=sum(value),Mean=mean(value),SD=sd(value),
            Low=Value-Mean*SD,Up=Value+Mean*SD) %>%
  ggplot(aes(x=factor(Subject_ID),y=Value,color=Type,group=Type))+
  geom_line(size=1)+
  geom_point()+
  geom_ribbon(
    aes(ymin = Low, ymax = Up,fill=Type), 
    alpha = 0.2
  )+
  theme_bw()+
  xlab('Subject')

Output:

enter image description here

Duck
  • 39,058
  • 13
  • 42
  • 84
  • That's useful, thanks. But is there a way to show an aggregate output grouping by Type? Like the image I posted. – Vicente Ramírez Garat Sep 18 '20 at 21:22
  • @VicenteRamírezGarat Yes, I will add an update in few minutes! – Duck Sep 18 '20 at 21:23
  • @VicenteRamírezGarat I have added an update, I hope that can help you! – Duck Sep 18 '20 at 21:27
  • Nice, it looks great! Maybe to show it's inner variance, I can create a new sequence for the upper deviation and lower deviation, it should be the mean+deviation and mean-deviation, or so, right? New into R, I'm all Python, Pandas and SQL. – Vicente Ramírez Garat Sep 18 '20 at 21:32
  • @VicenteRamírezGarat Yes you are right and you can play around that in order to obtain the plot you want using the code in this post as base! – Duck Sep 18 '20 at 21:34
  • Now I'm trying to complement with this two answers I found. https://www.biostars.org/p/312201/ and https://stackoverflow.com/questions/24626280/plot-mean-and-standard-deviation-by-category – Vicente Ramírez Garat Sep 18 '20 at 21:41
  • @VicenteRamírezGarat I have added an update using some info from that posts! I think you will be able to replicate the code. – Duck Sep 18 '20 at 21:49
  • It's like learning to ride a new bike model. The only thing I'm not getting is that X axis should be Time, because they are periods. First x-step needs to show aggregate behaviour for each type but just for period 1 (speed1). Thanks for all your help, though. – Vicente Ramírez Garat Sep 18 '20 at 22:19