-1

I'm trying to create a dotplot in R, similar to the following plot, where each group is distinctly separated from the rest: http://www.sthda.com/english/wiki/ggplot2-dot-plot-quick-start-guide-r-software-and-data-visualization
ideal plot

The data I have looks as follows, where I have a value to plot, and a group column that should bin the data into distinct groups (1-5) (similar to the 'dose' column in the Toothache dataset in the previous link):
my data

This is the plotting code I'm currently using:

p<-ggplot(new_df, aes(x=group, y=ploidy)) + 
  geom_dotplot(binaxis='y', stackdir='centerwhole', binpositions = 'bygroup', binwidth = 0.5, position = "dodge", dotsize = 0.2)

ggplot(new_df, aes(x=group, y=ploidy)) + 
  geom_dotplot(binaxis='y', stackdir='centerwhole',
               stackratio=0, dotsize=0.2, stackgroups = TRUE)
p + stat_summary(fun=median, geom="point", shape=18,
                 size=3, color="red")

and it returns the following plot: current plot I suspect the issue here is that the majority of the values sit at the 2-3 range, and thus they're overflowing to the other bins/groups.

I tried re-creating the problem with simple datasets like the Toothache dataset, but the issue doesn't reappear in those smaller datasets. Here is a link to the dataset, since recreating the problem with small sample datasets doesn't work: http://sendanywhe.re/Y5O133EM

Any help would be appreciated

Samer Baslan
  • 29
  • 1
  • 7

1 Answers1

1

I think you are overflowing the allocated space in the chart by using specified locations for each individual observation (sometimes called 'stacking'). Instead you should 'jitter' the positions of the individual observations inside a specific allocated region. Jittering, means to introduce a small amount of randomness to the position of a point to avoid (mostly anyhow) overplotting.

I will illustrate this using graphics from the core of R for the following fictitious data. This focuses attention on what is wrong, more than on the specific programming solution in ggplot, which I will let you work out.

set.seed(2022)
a = round(rnorm(30, 50, 5))
b = round(rnorm(70, 55, 4))
c = round(rnorm(55, 40, 6))
d = round(rnorm(80, 45, 5))
x = c(a,b,c,d)
g = rep(1:4, c(30,70,55,80))


stripchart(x ~ g, meth="jitter", vertical=T, pch=20)

Sorry, not allowed to post images on this site. Hope you you get the idea.

  • Thanks for the helpful answer. You're not allowed to post any images on this site? Why does it allow you to do so then? Is my use of images in my post 'not recommended'? The rules here can be confusing sometimes.. I originally posted this here and was told to post to Cross Validated, then my question was migrated back to stackoverflow.. – Samer Baslan Feb 26 '22 at 03:04
  • This actually solved my problem, thank you. – Samer Baslan Feb 26 '22 at 03:52