1

I'm trying to plot a basic correlation between two 7-point variables. I get a .72 correlation, but the data points are just one dot at every number point on the graph (like just dots equally spaced out into rows). I double checked to make sure the variables are numeric (they are). I tried several different ways to graph it, the dots always come out the same way. Any ideas?

Code:

library("ggpubr")
ggscatter(plotdata, x = "TID", y = "PID7", use = "complete.obs",
      add = "reg.line", conf.int = TRUE, 
      cor.coef = TRUE, cor.method = "pearson",
      xlab = "X", ylab = "Y")

Scatter Plot

Sample data:

dput(head(plotdata, 20))

structure(list(plotdata.TID = c(7, 1, 3, 5, 5, 7, 7, 6, 1, 4, 
1, 4, 1, 1, 7, 7, 1, 1, 1, 4), plotdata.PID7 = c(1, 1, 3, 6, 
6, 7, 6, 6, 2, 7, 1, 4, 1, 1, 7, 6, 2, 3, 2, 4)), row.names = c(NA, 
20L), class = "data.frame")
herzka
  • 13
  • 3

1 Answers1

2

Off the top of my head, it sounds to me that this is due to the data being discrete instead of continuous (especially since you said "7-point variables"). This means each point is getting plotted exactly on top of a bunch of other points, making it impossible to see how many are there.

Try plotting a scatterplot with some jitter to get a better sense of the distribution (this injects a little bit of randomness in the positioning of each point).

Here's how to do that:

https://ggplot2.tidyverse.org/reference/position_jitter.html

  • Using +geom_jitter() worked like a charm! Thank you! – herzka May 06 '19 at 21:02
  • Great! Please mark my answer as accepted to indicate to others that it worked for you. And another method that can also work is to set the points to be translucent by reducing the alpha, but I find jitter usually works better when there are so few possible values. – perfectlyGoodInk May 06 '19 at 21:16