I’m totally new to ggplot, relatively fresh with R and want to make a smashing ”before-and-after” scatterplot with connecting lines to illustrate the movement in percentages of different subgroups before and after a special training initiative. I’ve tried some options, but have yet to:
- show each individual observation separately (now same values are overlapping)
- connect the related before and after measures (x=0 and X=1) with lines to more clearly illustrate the direction of variation
- subset the data along class and id using shape and colors
How can I best create a scatter plot using ggplot (or other) fulfilling the above demands?
Main alternative: geom_point()
Here is some sample data and example code using genom_point
x <- c(0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1) # 0=before, 1=after
y <- c(45,30,10,40,10,NA,30,80,80,NA,95,NA,90,NA,90,70,10,80,98,95) # percentage of ”feelings of peace"
class <- c(0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,1,1) # 0=multiple days 1=one day
id <- c(1,1,2,3,4,4,4,4,5,6,1,1,2,3,4,4,4,4,5,6) # id = per individual
df <- data.frame(x,y,class,id)
ggplot(df, aes(x=x, y=y), fill=id, shape=class) + geom_point()
Alternative: scale_size()
I have explored stat_sum() to summarize the frequencies of overlapping observations, but then not being able to subset using colors and shapes due to overlap.
ggplot(df, aes(x=x, y=y)) +
stat_sum()
Alternative: geom_dotplot()
I have also explored geom_dotplot() to clarify the overlapping observations that arise from using genom_point() as I do in the example below, however I have yet to understand how to combine the before and after measures into the same plot.
df1 <- df[1:10,] # data before
df2 <- df[11:20,] # data after
p1 <- ggplot(df1, aes(x=x, y=y)) +
geom_dotplot(binaxis = "y", stackdir = "center",stackratio=2,
binwidth=(1/0.3))
p2 <- ggplot(df2, aes(x=x, y=y)) +
geom_dotplot(binaxis = "y", stackdir = "center",stackratio=2,
binwidth=(1/0.3))
grid.arrange(p1,p2, nrow=1) # GridExtra package