-2

I am using stat_cor with ggplot to add r and p values to a scatter plot. It is incorrectly calculating the p value based on the number of observations in long format data frame. It is confused about long format organization, and the p value is corresponds to if there were as many subjects as there were observations. The subject id variable is properly recognized and understood as factor when I check the structure of the data frame. Does anybody know how to fix this?

Long df example

  subject sex condition    x y
1       1   M   control  7.9 1
2       1   M     cond1 12.3 2
3       1   M     cond2 10.7 3
4       2   F   control  6.3 4
5       2   F     cond1 10.6 5
6       2   F     cond2 11.1 6

Here is the code

library(ggplot2)
library(ggpubr)
scatter <- ggplot(df, aes(x = x, y =   y)) + 
  geom_point(aes(colour = condition)) +
  geom_smooth(method = "lm") + 
  ggtitle("title") + 
  theme(axis.text=element_text(size=14),
        axis.title=element_text(size=14,face="bold"),
        plot.title = element_text(size = 20, face = "bold"))

scatter + stat_cor(method = "pearson", label.x = -2, label.y = 3)

having plot + scat_cor(method = "pearson"...) should calculate the pearson value of x and y (this page has the formula correct: http://www.stat.wmich.edu/s216/book/node122.html) it is calculating the p value as if the sample size, n , is the number of observations in long frame df.

b1234
  • 63
  • 7
  • Please include sample data and load all packages necessary to reproduce the problem. – markus Aug 02 '18 at 17:20
  • thanks i just revised – b1234 Aug 02 '18 at 17:41
  • 1
    What are you even trying to take the correlation of here exactly? What's the value (or values) you would expect to see? – MrFlick Aug 02 '18 at 17:57
  • The correlation between 2 measurements and corresponding p value. stat_cor is providing the p value to a correlation if the number of subjects was the number of observations. so it doesnt seem to understand that observations does not equal subjects in long form df. – b1234 Aug 02 '18 at 18:00
  • 1
    Please tell us exactly (with formulas or code) how you would calculate this correlation and corresponding p value. – shadow Aug 02 '18 at 18:03
  • thanks, I edited based on this. – b1234 Aug 02 '18 at 18:11

1 Answers1

0

I "fixed" this issue by reshaping the dataframe to wide. I was hoping, and imagine there still is, a fix that can get you the correct values while still in a long format, as most of R prefers long format.

b1234
  • 63
  • 7