0

I'd like to plot the correlations from variables B, C, D (y-axis) vs A (x-axis), to obtain a plot similar to this:

enter image description here

How can I obtain specific correlation trend lines and R values for each of the variables plotted on a log scaled y-axis? So far I have obtained the following:

A = c(3984,5307,3907,   3848,   4024,   6930,   6217,   6206,   5976,   4043)
B = c(18117,    16512,  17891,  17783,  12643,  12864,  10997,  12946,  12325,  12594)
C = c(9403, 9375,   7142,   6659,   8660,   9072,   7965,   8444,   6467,   6245)
D = c(443,  564,    541,    525,    503,    682,    563,    584,    639,    413)
data = data.frame(A, B, C, D)
data2<- melt(data,  id.vars = 'A', variable.name = 'letter')

ggplot(data2, aes(A,value)) + geom_point(aes(colour = letter)) +   scale_y_continuous(trans='sqrt') + stat_smooth(method=lm) + stat_cor(aes(color = letter), label.x = 3)

enter image description here

 ggplot(data2, aes(A,value)) + geom_point(aes(colour = letter)) + stat_cor(method = "pearson", label.x = 4000, label.y = 1.9)  + stat_smooth(method=lm) + facet_wrap(letter~ .)

enter image description here

Ecg
  • 908
  • 1
  • 10
  • 28

1 Answers1

1

I had not used stat_cor before. So I had to do some trial and error. Maybe there is a better way of getting the plot the way you need.

First issue in your code. Because you were setting the colour aesthetic inside geom_pointit was not being passed to the other functions (stat_cor and geom_smooth). To fix this you can set colour inside the ggplot function and it get passed to the other functions in the pipe.

Also, I had to create another data.frame to get the positions of the labels (label.x and label.y) in each group. In this case it worked, but I don't think it would work in all cases (e.g. if the lines crossed). Anyway, you would need to set the positions more or less manually, using an approach similar to what I did.

# for each letter, get x and y values for x == max(x)
df.pos.labels <- data2 %>% group_by(letter) %>% slice_max(A) %>%
  mutate(value=sqrt(value))

ggplot(data2, aes(x=A, y=value, colour=letter)) + geom_point()  +
  scale_y_continuous(trans='sqrt') + 
  ggpubr::stat_cor(method = "pearson", hjust=0.5, vjust=0, label.x = df.pos.labels$A, 
                   label.y=df.pos.labels$value) +
  stat_smooth(method='lm', formula = 'y ~ x') +
  coord_cartesian(clip = 'off')

enter image description here

This creates the lines and the equations mapping the colors to the groups. If you want your equations to be all same colour (e.g. black), you can map the colour aesthetic inside geom_point and stat_smooth separately and use the group parameter inside the main ggplot call.

ggplot(data2, aes(x=A, y=value, group=letter)) + geom_point(aes(colour = letter))  +
  scale_y_continuous(trans='sqrt') + 
  ggpubr::stat_cor(method = "pearson", hjust=0.5, vjust=0, label.x = df.pos.labels$A, 
                   label.y=df.pos.labels$value) +
  stat_smooth(aes(colour = letter), method='lm', formula = 'y ~ x') +
  coord_cartesian(clip = 'off')

enter image description here

Note the coord_cartesian(clip = 'off') so the equations won't get clipped at the end of the plotting area. You may need to move the legend because of this. You could also change the limits in the x-axis so the equations would fit inside the plotting area.

kikoralston
  • 1,176
  • 5
  • 6
  • This is great, thanks! about the equation position, what if limits are set via xlim and ylim? – Ecg Jan 11 '21 at 20:02
  • Ah I see you mentioned it at the bottom. Fantastic, will give it a go, thanks! – Ecg Jan 11 '21 at 20:03
  • 1
    you're welcome! you could remove `coord_cartesian(clip = 'off')` and use something like `xlim(c(NA, 7500))` (which leaves the lower bound the same and sets the upper bound to 7500). Using this data it worked for me. – kikoralston Jan 11 '21 at 20:12