R: ggplot2: avoid overlapping points and color formating

Question

I am new to SO and relatively new to R so please take it easy on me!

This is my scenario: I have a dataframe that has 24 meta-analytic distributions (Dist1-Dist24). For each distribution, I have seven estimates of the respective meta-analytic mean effect size before outlier removal (ES1.before-ES7.before) and seven estimates after outlier removal (ES1.after-ES7.after). Thus, there can be a total of 14 points per distribution.

What I am able to do: Using the below dataframe and script, I am able to plot all estimates for each distribution.

Here is my dataframe, which is delimited using a "|" (pipe). Note that some cells are purposefully blank.

|x|ES1.before|ES2.before|ES3.before|ES4.before|ES5.before|ES6.before|ES7.before|ES1.after|ES2.after|ES3.after|ES4.after|ES5.after|ES6.after|ES7.after|
|Dist 1|-0.10|-0.1|-0.02|-0.04|-0.13|-0.03|0.00|-0.09|-0.09|-0.04|-0.06|-0.11|-0.03|-0.010|
|Dist 2|-0.11|-0.11|-0.06|-0.07|-0.14|-0.08|-0.06|-0.09|-0.09|-0.06|-0.07|-0.11|-0.08|-0.06|
|Dist 3|-0.12|-0.13|-0.03|-0.12|-0.15|-0.07|-0.02|-0.09|-0.09|-0.03|-0.04|-0.11|-0.04|0.00|
|Dist 4|-0.09|-0.09|-0.03|-0.04|-0.13|-0.05|-0.01|-0.11|-0.11|-0.06|-0.08|-0.13|-0.06|-0.05|
|Dist 5|-0.17|-0.17|-0.13|-0.13|-0.19|-0.11|-0.13|-0.16|-0.16|-0.13|-0.12|-0.18|-0.11|-0.13|
|Dist 6|-0.10|-0.10|-0.03|-0.04|-0.13|-0.04|-0.02|-0.08|-0.08|-0.04|-0.04|-0.10|-0.03|-0.02|
|Dist 7|-0.07|-0.07|-0.02|-0.02|-0.09|-0.06|-0.01|-0.08|-0.08|-0.03|-0.07|-0.09|-0.10|-0.01|
|Dist 8|-0.04|-0.05|0.00|-0.04|-0.09|-0.01|-0.01|-0.03|-0.03|-0.01|-0.01|-0.06|-0.01|0.00|
|Dist 9|-0.15|-0.15|-0.08|-0.11|-0.17|-0.08|-0.05|-0.12|-0.12|-0.08|-0.11|-0.14|-0.08|-0.07|
|Dist 10|-0.09|-0.09|-0.03|-0.05|-0.12|-0.03|0.00|-0.08|-0.09|-0.02|-0.05|-0.10|-0.03|0.00|
|Dist 11|-0.12|-0.09|-0.02|-0.02|-0.14|0|0.01|-0.12|-0.12|-0.07|-0.07|-0.13|-0.09|0.08|
|Dist 12|-0.07|-0.07|-0.02|-0.02|-0.11|-0.02|-0.01|
|Dist 13|-0.09|-0.09|-0.03|-0.05|-0.13|-0.03|-0.01|-0.07|-0.07|-0.03|-0.04|-0.09|-0.01|-0.01|
|Dist 14|-0.1|-0.09|-0.04|-0.06|-0.12|-0.05|-0.01|-0.12|-0.12|-0.05|-0.12|-0.14|-0.09|0.01|
|Dist 15|-0.05|-0.05|-0.04|-0.04|-0.04|-0.04|-0.03|
|Dist 16|-0.17|-0.17|-0.07|-0.11|-0.20|-0.09|-0.03|-0.16|-0.16|-0.07|-0.11|-0.18|-0.09|-0.07|
|Dist 17|-0.18|-0.18|-0.06|-0.18|-0.23|-0.09|-0.11|                     
|Dist 18|-0.23|-0.23|-0.05|-0.21|-0.26|-0.15|0.01|-0.11|-0.10|-0.04|-0.11|-0.12|-0.09|0.02|
|Dist 19|-0.07|-0.07|-0.02|-0.02|-0.12|-0.05|-0.01|-0.06|-0.06|-0.03|-0.03|-0.07|-0.02|0.00|
|Dist 20|-0.10|-0.10|-0.04|-0.07|-0.13|-0.06|-0.02|-0.10|-0.10|-0.04|-0.08|-0.12|-0.06|-0.02|
|Dist 21|-0.10|-0.10|-0.03|-0.05|-0.13|-0.04|-0.01|-0.09|-0.09|-0.05|-0.07|-0.11|-0.06|-0.05|
|Dist 22|-0.15|-0.15|-0.15|-0.15|-0.19|-0.04|-0.11|
|Dist 23|-0.11|-0.11|-0.05|-0.06|-0.14|-0.03|0.03|
|Dist 24|-0.10|-0.10|-0.04|-0.06|-0.16|-0.03|-0.05|-0.10|-0.10|-0.06|-0.07|-0.13|-0.04|-0.05|

Here is my code:

# setwd("your path")

# install packages
install.packages("ggplot2")
library(ggplot2)
install.packages("reshape2")
library(reshape2)
windowsFonts(TNR = windowsFont("Times New Roman"))

# load the data 
d1 <- read.table(file.choose(), sep=",", header=TRUE)

# normalize the data (i.e., convert into 'long' form)
dat1 = melt(d1, id.vars = "x")

# Set x factor order in order that appears in data
dat1$x = factor(dat1$x, levels = unique(dat1$x))

# Begin plotting
Fig1 <- ggplot(dat1, aes(x=x, y= value))+
    geom_point(aes(shape = variable)) +
    geom_line(data=dat1[!is.na(dat1$value),]) +
    scale_shape_manual(values = 0:13) +
    geom_hline(yintercept = 0, linetype=2) +
    coord_flip() +
    xlab('Distribution') +
    ylab('Effect size') +
    ylim(-.30, .05)

Fig1 <- Fig1 + 
    theme_bw() + 
    theme(panel.border = element_blank(), 
          panel.grid.major = element_blank(),
          panel.grid.minor = element_blank(), 
          axis.line.x = element_line(colour = "black"),
          axis.line.y = element_line(colour = "black"),
          axis.text.x = element_text(size = 8),
          axis.text.y = element_text(size = 8),
          legend.position = ("bottom"),
          legend.title=element_blank(),
          legend.text=element_text(size=8),
          legend.key = element_rect(colour = NA))

Fig1

Here is the figure I can produce:

Distribution of effect sizes

Note that (1) the dataframe, (2) the syntax, and (3) the plot can be found in a Dropbox folder here

I have two questions:

(1) Can ES1.before and ES1.after (and so on) share the same shape but have different colors? For instance, is there a way to have ES1.before as a blue shape and ES1.after as a red shape, ES2.before and ES2.after as a different blue and red shape, respectively, and so on?

(2) Currently, the shapes overlap when the effect size values are the same. Is there a way to stack the shapes on top of each other when the effect size values are the same? Note that I have played around with the 'dodge' and 'jitter' functions extensively but have not been able to find a solution.

Thanks to anyone who is willing to help me :) I apologize if anything is unclear. I will be very happy to clarify anything that is unclear.

welcome to SO, `dput(your_df)` is the easiest way to add your example data for reproducibility — Nate, Sep 26 '16 at 19:15

Nate · Answer 1 · 2016-09-26T22:07:31.853

Does this work for you? In ggplot if you want to colour or shape it is best practice to make the variable you want to adjust aesthetics by serperate columns in your plotting data_frame. So I just moved before/after into its own column removed it from variable.

library(dplyr)
library(magrittr)
library(ggbeeswarm)
dat1 %<>% mutate(time = gsub(".*\\.", "", variable),
             variable = gsub("\\..*", "", variable))

# Begin plotting
Fig1 <- ggplot(dat1, aes(x=x, y= value, colour = time, group = variable))+
    geom_point(aes(shape = variable), position = position_quasirandom()) +
    geom_line(data=dat1[!is.na(dat1$value),], aes(colour = NULL, group = NULL)) +
    scale_shape_manual(values = 0:13) +
    geom_hline(yintercept = 0, linetype=2) +
    coord_flip() +
    xlab('Distribution') +
    ylab('Effect size') +
    ylim(-.30, .05)

Fig1 <- Fig1 + 
    theme_bw() + 
    theme(panel.border = element_blank(), 
          panel.grid.major = element_blank(),
          panel.grid.minor = element_blank(), 
          axis.line.x = element_line(colour = "black"),
          axis.line.y = element_line(colour = "black"),
          axis.text.x = element_text(size = 8),
          axis.text.y = element_text(size = 8),
          legend.position = ("bottom"),
          legend.title=element_blank(),
          legend.text=element_text(size=8),
          legend.key = element_rect(colour = NA))

Fig1

# Bonus: Instead of setting all of those theme parameters each time try theme_set
# if you call this at the start of your script all of your plots will inherit these attributes
# these are just what I use regularly
theme_set(theme_light() + theme(axis.title = element_text(size = rel(1.5)),
                            axis.text = element_text(size = rel(1.5)),
                            plot.title = element_text(size = rel(2)),
                            strip.text = element_text(size = rel(1.5)),
                            legend.title = element_text(size = rel(1.5), face = "bold"),
                            legend.text = element_text(size = rel(1.5)),
                            axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = .5)))

First of all, than you very much for taking the time to help me out. I am trying to replicate what you produced. However, things are breaking down when I try to 'mutate.' Specifically, I get the following error message -- "Error: could not find function "%<>%" -- do you have any suggestions for this? — James F, Sep 26 '16 at 20:28
whats your error? I probably should have included `library(dplyr);library(magrittr)` in my answer, they are used in the `mutate` call — Nate, Sep 26 '16 at 20:31
OK -- I am able to reproduce your figure, which is awesome! Thank you so much for getting me this far. However, it seems as if some shapes are still overlapping. In addition, in some cases, it looks like the shape is not on the line even when there is not a duplicate/overlapping one. As such, is it possible to do the following: have all shapes *ON* their respective lines and have the overlapping shapes dodged vertically? I don't know if this is possible. — James F, Sep 26 '16 at 21:14
The way `position_dodge` works is by creating bins so that each variable (in this case, shape) always occurs in the same "channel". This means that the variables won't collapse even when no other point is competing for that space. `position_jitter` might be closer to what you want, but it has the undesirable effect of not respecting "true" values meaning it might shift left or right and distort your data, albeit slightly — Nate, Sep 26 '16 at 21:59
there are some text plotting tools in the `library(ggrepel)` that work similar to what you are talking about. I just update the answer using `position_quasirandom` from `library(ggbeeswarm)`. It has the closest behavior to what you are looking for (at least that I know of) — Nate, Sep 26 '16 at 22:05
Thanks for all your advice. The update you made was very helpful and will likely work for now. I thought there would have been an easy solution to my question but I was wrong! — James F, Sep 27 '16 at 19:59

R: ggplot2: avoid overlapping points and color formating

1 Answers1