I am using stat_cor with ggplot to add r and p values to a scatter plot. It is incorrectly calculating the p value based on the number of observations in long format data frame. It is confused about long format organization, and the p value is corresponds to if there were as many subjects as there were observations. The subject id variable is properly recognized and understood as factor when I check the structure of the data frame. Does anybody know how to fix this?
Long df example
subject sex condition x y
1 1 M control 7.9 1
2 1 M cond1 12.3 2
3 1 M cond2 10.7 3
4 2 F control 6.3 4
5 2 F cond1 10.6 5
6 2 F cond2 11.1 6
Here is the code
library(ggplot2)
library(ggpubr)
scatter <- ggplot(df, aes(x = x, y = y)) +
geom_point(aes(colour = condition)) +
geom_smooth(method = "lm") +
ggtitle("title") +
theme(axis.text=element_text(size=14),
axis.title=element_text(size=14,face="bold"),
plot.title = element_text(size = 20, face = "bold"))
scatter + stat_cor(method = "pearson", label.x = -2, label.y = 3)
having plot + scat_cor(method = "pearson"...) should calculate the pearson value of x and y (this page has the formula correct: http://www.stat.wmich.edu/s216/book/node122.html) it is calculating the p value as if the sample size, n , is the number of observations in long frame df.