Add 2D conditional distributions (in a third dimension) to 2D scatterplot in R

Question

Can anyone think of a way to add, to a 2D scatterplot, a third dimension that houses distinct distributions for Y|X=120, Y|X=140, and Y|X=160? I'm trying to include theoretical standard normals for starters (but would eventually like to include the empirical distributions).

For reference, here's a ggplot2 depiction of the 2D scatterplot

df <- data.frame(x = c(replicate(5, 120), replicate(7, 140), replicate(6, 160)),
                 y = c(c(79, 84, 90, 94, 98), c(80, 93, 95, 103, 108, 113, 115),
                       c(102, 107, 110, 116, 118, 125)))

library(dplyr)
df <- df %>% group_by(x) %>% mutate(gp.mn = mean(y))

library(ggplot2)
( ggplot(df, aes(x = x)) + geom_point(aes(y = y)) + geom_line(aes(y = gp.mn)))

scatter

I'm essentially trying to replicate an image I created in .tpx:

enter image description here

I'm not tied to any particular 3D package, but plot3Drgl can be used to generate a 2D plot similar to the one above:

library(plot3Drgl)
scatter2Drgl(df$x, df$y, xlab = "x", ylab = "y")
scatter2Drgl(df$x, df$gp.mn, type = "l", add = TRUE, lwd = 4)

My hope was to use the 2D plot as a building block for a pseudo-3D rgl plot, however, incorporating the distributions into a third dimension (rgl or otherwise) is eluding me. Any thoughts?

score 1 · Accepted Answer · answered Dec 31 '14 at 20:38

1

Maybe this will help. (I've never been very happy with he ggplot paradigm so I'm showing a base graphics version that someone can translate.) I also thought adding the group means to the df-object confused things so I'm only using the oritignal df.

 aggregate(y~x,df, FUN=function(y) c(mn=mean(y),sd=sd(y))  )
#--------
    x       y.mn       y.sd
1 120  89.000000   7.615773
2 140 101.000000  12.476645
3 160 113.000000   8.294577
#----------
png(); plot(df, xlim=c(110,170) )
lines( x= 120 - 100*dnorm(seq(89-2*7.6,89+2+7.6,length=25), 89, 7.6), 
       y= seq(89-2*7.6,89+2+7.6,length=25) )
lines( x=140 - 100*dnorm(seq(101-2*12.5,101+2*12.5,length=25), 101, 12.5), 
       y- seq(89-2*7.6,101+2+12.5,length=25) );dev.off()

enter image description here

The basic strategy is to reverse the argument order (and expand the distribution value by multiplying by a factor on the scale of the plotted points) and then "translate" the distributions so they are adjacent to the points they are derived from.

answered Dec 31 '14 at 20:38

IRTFM

258,963
21
364
487

this is pretty helpful. Although, I was looking for a way to actually create a third axis for the distributions (while leaving the scatterplot in the (x,y)-plane) – Pat W. Dec 31 '14 at 20:54
You might look at http://www.faqoverflow.com/stats/78828.html and http://stats.stackexchange.com/questions/71260/what-is-the-intuition-behind-conditional-gaussian-distributions – IRTFM Dec 31 '14 at 21:54
these are great links, and thanks for the pseudo-3d correction...your tilted floor comment makes sense now. I edited my question above with the desired image style – Pat W. Dec 31 '14 at 23:53
1

I've decided it would be easier with the `persp` function. See the first two examples in the `?persp`-page. If you want those arrows you will need to provide a data-object that has their tails and head locations in some 3d vector. – IRTFM Jan 01 '15 at 00:38
For posterity, the `?persp` page that @BondedDust suggest does the trick! – Pat W. Jan 01 '15 at 13:02
I think you should add your code as another answer, uncheck mine and after a suitable interval, checkmark yours. I'm sure other people will find it useful. I certainly don't need rep that I didn't really earn. – IRTFM Jan 01 '15 at 17:28

Add 2D conditional distributions (in a third dimension) to 2D scatterplot in R

1 Answers1