2

My problems seems simple, I am using ggplot2 with geom_jitter() to plot a variable. (take my picture as an example)

Jitter now adds some random noise to the variable (the variable is just called "1" in this example) to prevent overplotting. So I have now random noise in the y-direction and clearly what otherwise would be completely overplotted is now better visible.

But here is my question:

As you can see, there are still some points, that overplot each other. In my example here, this could be easily prevented, if it wouldn't be random noise in y-direction... but somehow more strategically placed offsets.

Can I somehow alter the geom_jitter() behavior or is there a similar function in ggplot2 that does exactly this? enter image description here

Not really a minimal example, but also not too long:

library("imputeTS")
library("ggplot2")

data <- tsAirgap


# 2.1 Create required data

# Get all indices of the data that comes directly before and after an NA

na_indx_after <- which(is.na(data[1:(length(data) - 1)])) + 1
# starting from index 2 moves all indexes one in front, so no -1 needed for before
na_indx_before <- which(is.na(data[2:length(data)]))

# Get the actual values to the indices and put them in a data frame with a label
before <- data.frame(id = "1", type = "before", input = na_remove(data[na_indx_before]))
after <- data.frame(id = "1", type = "after", input = na_remove(data[na_indx_after]))
all <- data.frame(id = "1", type = "source", input = na_remove(data))

# Get n values for the plot labels
n_before <- length(before$input)
n_all <- length(all$input)
n_after <- length(after$input)



# 2.4 Create dataframe for ggplot2

# join the data together in one dataframe
df <- rbind(before, after, all)


# Create the plot

gg <- ggplot(data = df) +
  geom_jitter(mapping = aes(x = id, y = input, color = type, alpha = type), width = 0.5 , height = 0.5) 

gg <- gg + ggplot2::scale_color_manual(
  values = c("before" = "skyblue1", "after" = "yellowgreen","source" = "gray66"),
)

gg <- gg + ggplot2::scale_alpha_manual(
  values = c("before" = 1, "after" = 1,"source" = 0.3),
)

gg + ggplot2::theme_linedraw() + theme(aspect.ratio = 0.5) + ggplot2::coord_flip()

So many good suggestions...here is what Bens suggestion would look like for my example:

I changed parts of my code to:

gg <- ggplot(data = df, aes(x = input,  color = type, fill = type, alpha = type)) +
  geom_dotplot(binwidth = 15) 

enter image description here

Would basically also work as intended for me. ggbeeplot as suggested by Jon also worked great for my purpose.

Steffen Moritz
  • 7,277
  • 11
  • 36
  • 55
  • 1
    I don't know anything built into ggplot2 to do this. You might look at the `ggbeeswarm` package for a few options on this. But it might not be what you want since it packs all the points together toward the centerline. You might alternately define a quasirandom function to do this, e.g. using `poissoned` or `poissondisc` packages. Or if you want to go overkill you could create a simulation using `particles` to repel any overlapping points. – Jon Spring Jun 10 '21 at 22:46
  • 1
    Can we have a [mcve] please? I thought that `ggplot(df, aes(x, y, color = col, fill=col)) + geom_dotplot(stackdir="center",binwidth=0.1, alpha=0.5) ` would work (using @JonSpring's example but I don't think it works for variable `y`. – Ben Bolker Jun 11 '21 at 02:24
  • You guys are such a great help! I clearly had difficulties finding the right terms to google. Ben's solution also works great for my purpose (see my edited answer). You might also add this as answer, Ben. Nice thing here is it doesn't need an additional package beyond ggplot2. But I also quite like all the possibilities that come with the ggbeeswarm package. Perfect, I have gone from no satisfying solution to choosing from multiple nice solutions in 1 day. Thanks so much. – Steffen Moritz Jun 11 '21 at 11:18

1 Answers1

4

I thought of a hack I really like, using ggrepel. It's normally used for labels, but nothing preventing you from making the label into a point.

df <- data.frame(x = rnorm(200),
                 col = sample(LETTERS[1:3], 200, replace = TRUE),
                 y = 1)

ggplot(df, aes(x, y, label = "●", color = col)) + # using unicode black circle
  ggrepel::geom_text_repel(segment.color = NA, 
                           box.padding = 0.01, key_glyph = "point")

enter image description here

A downside of this method is that ggrepel can take a lot time for a large number of points, and will recalculate differently each time you change the plot size. A faster alternative would be to use ggbeeswarm::geom_quasirandom, which uses a deterministic process to define jitter that looks random.

ggplot(df, aes(x,y, color = col)) +
  ggbeeswarm::geom_quasirandom(groupOnX = FALSE)

enter image description here

Jon Spring
  • 55,165
  • 4
  • 35
  • 53
  • Nice, but only seems to work if there are not too many points. Otherwise I get "ggrepel: 95 unlabeled data points (too many overlaps). Consider increasing max.overlaps" and it just doesn't print all the points. Also don't know what is actually does behind the scenes (what is added into the plot) but the plot takes really long to load (when switching plots). Your ggbeeswarm suggestion though works perfectly (maybe you also want to add this to the answer or as additional answer, that I can accept it). Thanks a lot for your help, Jon! – Steffen Moritz Jun 11 '21 at 00:31