0

I have a vector of length 10k for each of the variables x and z. For each of the 10k, I have also estimated propensity scores using logit and other methods. So I have another vector that contains the predicted propensity scores.

I want to plot predicted propensity vector as the height of the 3d graph and as a function of the x and z vectors (I want something like a surface). What is the best way to go about doing this? I tried using scatter3d() from the plot3d library and it looks very bad.

Sample data: https://www.dropbox.com/s/1lf36dpxvebd7kw/mydata2.csv?dl=0

Werner Hertzog
  • 2,002
  • 3
  • 24
  • 36
user52932
  • 155
  • 10

1 Answers1

0

Updated Answer

Using the data you provided, we can bin the data, get the average propensity score by bin and plot using geom_tile. I provide code for that below. A better option would be to fit the propensity score model using the x and z vectors (and the binary treatment variable that you're predicting). Then, create a new data frame of predicted pz_p values on a complete grid of x and z values and plot that. I don't have your binary treatment variable with which to fit the model, so I haven't produced an actual plot, but the code would look something like this:

# Propensity score model
m1 = glm(treat ~ x + z, data=dat, family=binomial)

# Get propensity scores on full grid of x and z values
n = 100 # Number of grid points. Adjust as needed.
pred.dat = expand.grid(x=seq(min(dat$x),max(dat$x),length=n,
                       z=seq(min(dat$z),max(dat$z),length=n)
pred.dat$pz_p = predict(m1, newdata=pred.dat, type="response")

ggplot(pred.dat. aes(x, z, fill=pz_p)) +
  geom_tile() +
  scale_fill_gradient2(low="red", mid="white", high="blue", midpoint=0.5, limits=c(0,1))

Code for tile plot with binned data:

library(tidyverse)
theme_set(theme_classic())

dat = read_csv("mydata2.csv")

# Bin by x and z
dat = dat %>% 
  mutate(xbin = cut(x,breaks=seq(round(min(x),1)-0.05,round(max(x),1)+0.05,0.1),
                    labels=seq(round(min(x),1), round(max(x),1),0.1)),
         xbin=as.numeric(as.character(xbin)),
         zbin = cut(z,breaks=seq(round(min(z),1)-0.1,round(max(z),1)+0.1,0.2),
                    labels=seq(round(min(z),1), round(max(z),1),0.2)),
         zbin=as.numeric(as.character(zbin)))

# Calculate average pz_p by bin and then plot
ggplot(dat %>% group_by(xbin, zbin) %>% 
         summarise(pz_p=mean(pz_p)), 
       aes(xbin, zbin, fill=pz_p)) +
  geom_tile() +
  scale_fill_gradient2(low="red", mid="white", high="blue", midpoint=0.5, limits=c(0,1))

enter image description here

Original Answer

A heat map might work well here. For example:

library(ggplot2)

# Fake data
set.seed(2)
dat = expand.grid(x=seq(0,10,length=100),
                  z=seq(0,10,length=100))  
dat$ps = 1/(1 + exp(0.3 + 0.2*dat$x - 0.5*dat$z))

ggplot(dat, aes(x, z, fill=ps)) +
  geom_tile() +
  scale_fill_gradient2(low="red", mid="white", high="blue", midpoint=0.5, limits=c(0,1)) +
  coord_equal()

enter image description here

Or in 3D with rgl::persp3d:

library(rgl)
library(tidyverse)

x=unique(sort(dat$x))
z=unique(sort(dat$z))
ps=dat %>% spread(z, ps) %>% select(-1) %>% as.matrix

persp3d(x, z, ps, col="lightblue")

enter image description here

eipi10
  • 91,525
  • 24
  • 209
  • 285
  • So my x and z are not sorted. When I am using the heat map code, no color is produced. I just see grey cells. What I want is some grid and the code to locate the three coordinates automatically on the grid. I am not sure if I can sort on x and z both because of these are continuous independent random variables so if x is lets say ascending, z can be jumping around. Any solutions for this? – user52932 May 31 '18 at 04:02
  • `persp3D` needs the unique values of `x` and `y` (where your `z` is the `y` argument) sorted. Then, the `z` values (your propensity scores) should be a matrix where the columns headings are the y values (from lowest to highest) and the row names at the x values (from lowest to highest) and each value in the matrix is the PS value at the given `x` and `y` values. – eipi10 May 31 '18 at 04:17
  • sorry. I am trying to use the ggplot code instead of the 3d plot. The code for your heatmap produces grey tiles with no color and I am wondering if there is a way to resolve this. I am uploading a csv file containing my data on the main post. (added the file) – user52932 May 31 '18 at 04:31