4

I created a bar chart using geom_bar with "Group" on the x-axis (Female, Male), and "Values" on the y-axis. Group is further subdivided into "Session" such that there is "Session 1" and "Session 2" for both Male and Female (i.e. four bars in total).

Since all participants participated in Session 1 and 2, I overlayed a dotplot (geom_dot) over each of the four bars, to represent the individual data.

I am now trying to connect the observations for all participants ("PID"), between session 1 and 2. In other words, there should be lines connecting several sets of two-points on the "Male" portion of the x-axis (i.e. per participant), and "Female portion".

I tried this with "geom_line" (below) but to no avail (instead, it created a single vertical line in the middle of "Male" and another in the middle of "Female"). I'm not too sure how to fix this.

See code below:

ggplot(data_foo, aes(x=factor(Group),y=Values, colour = factor(Session), fill = factor(Session))) + 
          geom_bar(stat = "summary", fun.y = "mean", position = "dodge") + 
          geom_dotplot(binaxis = "y", stackdir = "center", dotsize = 1.0, position = "dodge", fill = "black") +
          geom_line(aes(group = PID), colour="dark grey") +
          labs(title='My Data',x='Group',y='Values') +
          theme_light() 

Sample data (.txt)

data_foo <- readr::read_csv("PID,Group,Session,Values
P1,F,1,14
P2,F,1,13
P3,F,1,16
P4,M,1,18
P5,F,1,20
P6,M,1,27
P7,M,1,19
P8,M,1,11
P9,F,1,28
P10,F,1,20
P11,F,1,24
P12,M,1,10
P1,F,2,26
P2,F,2,21
P3,F,2,19
P4,M,2,13
P5,F,2,26
P6,M,2,15
P7,M,2,23
P8,M,2,23
P9,F,2,30
P10,F,2,21
P11,F,2,11
P12,M,2,19")
tjebo
  • 21,977
  • 7
  • 58
  • 94
Grace
  • 55
  • 8
  • welcome to SO. Could you kindly either create sample data or use `dput(head(data,20))` and post the output? You can also use one of the many inbuilt data sets in R – tjebo Jan 15 '20 at 00:05
  • I am also not sure if you really want `geom_dotplot`? Maybe `geom_point` instead? – tjebo Jan 15 '20 at 00:09
  • Thanks Tjebo - I am adding sample data right now! – Grace Jan 15 '20 at 16:57
  • P.S. I noticed that adding this line "geom_line(aes(x=factor(Session), group = PID), colour="dark grey") +" will make my lines sepearte from my bar_graph (in case this might be a lead?) – Grace Jan 15 '20 at 18:44
  • 1
    P.s. to my answer - this is bascially a nice exercise to learn ggplot, but I generally think this type of visualisation is rather confusing. In my own example from my very first question, I actually completely changed the way I showed the data - maybe try to plot value session 1 on the x axis and session 2 on the y axis, just with `geom_point`, and fill/color by `Group`. You will have all the information which you want to show in a much more understandable way (and easier to plot) – tjebo Jan 15 '20 at 18:55
  • Hi Tjebo, thank you so much for your comment and suggestion!! – Grace Jan 15 '20 at 20:08

2 Answers2

3

The trouble you have is that you want to dodge by several groups. Your geom_line does not know how to split the Group variable by session. Here are two ways to address this problem. Method 1 is probably the most "ggploty way", and a neat way of adding another grouping without making the visualisation too overcrowded. for method 2 you need to change your x variable

1) Use facet

2) Use interaction to split session for each Group. Define levels for the right bar order

I have also used geom_point instead, because geom_dot is more a specific type of histogram. I would generally recommend to use boxplots for such plots of values like that, because bars are more appropriate for specific measures such as counts.

Method 1: Facets

library(ggplot2)
ggplot(data_foo, aes(x = Session, y = Values, fill = as.character(Session))) +
  geom_bar(stat = "summary", fun.y = "mean", position = "dodge") + 
  geom_line(aes(group = PID)) +
  geom_point(aes(group = PID), shape = 21, color = 'black') +
  facet_wrap(~Group)

Created on 2020-01-20 by the reprex package (v0.3.0)

Method 2: create an interaction term in your x variable. note that you need to order the factor levels manually.

data_foo <- data_foo %>% mutate(new_x = factor(interaction(Group,Session), levels = c('F.1','F.2','M.1','M.2')))

ggplot(data_foo, aes(x = new_x, y = Values, fill = as.character(Session))) + 
  geom_bar(stat = "summary", fun.y = "mean", position = "dodge") +
  geom_line(aes(group = PID)) +
  geom_point(aes(group = PID), shape = 21, color = 'black') 

Created on 2020-01-20 by the reprex package (v0.3.0)

But everything gets visually not very compelling.

tjebo
  • 21,977
  • 7
  • 58
  • 94
  • Hi Again! This looks great - I was hoping to have Session as my grouping variable, and Group as my x-variable. I tried moving these around but it doesn't really seem to work. – Grace Jan 15 '20 at 20:29
  • Hi Tjebo - I will re-accept, but I thought to leave it unaccepted in case someone knows how to swap the variables (such that Session is the grouping variable, and Gender is the X-Axis - it did not let me put categorical variables as the x-axis with this method - I'm sure that through re-labeling I might be able to fix this problem but just in case there is a more efficient way to do this, I thought to check. Thanks so much! – Grace Jan 20 '20 at 15:01
  • OK I think I finally understood what you want. Give me a second – tjebo Jan 20 '20 at 15:59
  • Thank you, thank you, thank you. This was *beyond* useful, thank you so much. – Grace Jan 20 '20 at 19:48
1

I suggest doing a few visualization tips to have a more informative chart. For example, I feel like having a differentiation of colors for PID will help us track the changes of each participant for different levels of other variables. Something like:

library(ggplot2)
ggplot(data_foo, aes(x = factor(Session), y = Values, fill = factor(Session))) +
  geom_bar(stat = "summary", fun.y = "mean", position = "dodge") + 
  geom_line(aes(group = factor(PID), colour=factor(PID)), size=2, alpha=0.7) +
  geom_point(aes(group = factor(PID), colour=factor(PID)), shape = 21, size=2,show.legend = F) +
  theme_bw() + 
  labs(x='Session',fill='Session',colour='PID')+
  theme(legend.position="right") +
  facet_wrap(~Group)+
  scale_colour_discrete(breaks=paste0('P',1:12))

And we have the following plot:

enter image description here

Hope it helps.

Afshin
  • 103
  • 6