3

EDIT: I agree with Roland that I didn't need to spend all that text on the Shiny-specific stuff. Removed and added the gist of what the data frame should look like after thinking about it more.


EDIT_2: While the shiny stuff wasn't really relevant to the question, I created an example using Roland's solution below if passer-bys are interested to see what I was doing with this. Have patience with the graphic loading; it can be a bit slow.


I'm trying to plot a set of predicted modeling data in R and shiny. I have four variables whose interactions I'd like to show in contour plots. For each variable, I have the user define a range as well as a hold value. There are two cases for each variable:

  • Used as one of the axis variables: The range determines over what values I predict new responses in my model for that variable
  • Not featured directly in the plot: The hold values are used to set non-featured variables at a constant value so that my prediction for the other two variables yields only one unique/single surfaced z value for each x and y combionation

I'm running into issues with handling the data in a way that's friendly to producing a grid of contour plots. I'd ideally like a screen showing the 6 interactions between the other four variables (4C2).

I essentially need two sets of data:

  • One in the original form of the input data set used to train the model (so that I can do predict(model, newData) to obtain the output column to be used for z values
  • A subset/rearranged form of the previous for plotting/facetting

For the facet-friendly version, this is what I'd need (in my mind; perhaps there is a better way):

| x        | y        | z | col | row |
|----------+----------+---+-----+-----|
| var1_min | var2_min | z |   1 |   1 |
| var1_min | ...      | z |   1 |   1 |
| var1_min | var2_max | z |   1 |   1 |
| ...      | ...      | z |   1 |   1 |
| var1_max | var2_min | z |   1 |   1 |
| var1_max | ...      | z |   1 |   1 |
| var1_max | var2_max | z |   1 |   1 |
|----------+----------+---+-----+-----|
| var1_min | var3_min | z |   1 |   2 |
| var1_min | ...      | z |   1 |   2 |
| var1_min | var3_max | z |   1 |   2 |
| ...      | ...      | z |   1 |   2 |
| var1_max | var3_min | z |   1 |   2 |
| var1_max | ...      | z |   1 |   2 |
| var1_max | var3_max | z |   1 |   2 |
|----------+----------+---+-----+-----|
| ...      | ...      | z |     |     |
|----------+----------+---+-----+-----|
| var3_min | var4_min | z |   3 |   2 |
| var3_min | ...      | z |   3 |   2 |
| var3_min | var4_max | z |   3 |   2 |
| ...      | ...      | z |   3 |   2 |
| var3_max | var4_min | z |   3 |   2 |
| var3_max | ...      | z |   3 |   2 |
| var3_max | var4_max | z |   3 |   2 |
|----------+----------+---+-----+-----|

In this way, I have my x and y values, the column of the corresponding predicted response from the model, and something to create a facet_grid with (either a 2x3 or 3x2) facet.

For the prediction data frame, the form would have to match my initial prediction data and would be almost like a cast/wide form version of the above:

| var1      | var2      | var3      | var4      |
|-----------+-----------+-----------+-----------|
| var1_min  | var2_min  | var3_hold | var4_hold |
| var1_min  | ...       | var3_hold | var4_hold |
| var1_min  | var2_max  | var3_hold | var4_hold |
| ...       | ...       | var3_hold | var4_hold |
| var1_max  | var2_min  | var3_hold | var4_hold |
| var1_max  | ...       | var3_hold | var4_hold |
| var1_max  | var2_max  | var3_hold | var4_hold |
| ...       | ...       | ...       | ...       |
| var1_hold | var2_hold | var3_max  | var4_min  |
| var1_hold | var2_hold | var3_max  | ...       |
| var1_hold | var2_hold | var3_max  | var4_max  |

I feed that into the model to obtain the predicted response to be used as z in the contour plot.

It also gets tricky as I don't always want the variables in their natural ith order, as I need to arrange them to have one common axis scale across facet rows or down facet columns (could be either, does not need to be both). I'd arrange the combinations something like this:

| x    | y    | row | column |
|------+------+-----+--------|
| var1 | var2 |   1 |      1 |
| var1 | var3 |   2 |      1 |
| var2 | var3 |   1 |      2 |
| var2 | var4 |   2 |      2 |
| var4 | var3 |   1 |      3 |
| var4 | var1 |   2 |      3 |

Now I can have three columns and two rows of facets with column 1 having the shared var1 axis, column two with var2, and column 3 with var4.

I was wondering about using expand.grid manually to six unique combinations of variables. Once that's done, I realized that every row will feature two variables set at their hold values, so perhaps I could create a list of these six combinations and then extract the non-hold-value variables into two new columns for the plotting data frame?

Any suggestions?


Here's a hackish example I tried with three variables, trying to focus on just the interaction between var1 and c(var2, var3):

# the min/max arguments to `seq()` are like the user-defined range
# take the second argument to `c()` is to be user-defined hold value

library(ggplot2)

var1 <- seq(0, 25, length.out = 10) # hold value = 11.1
var2 <- seq(5, 45, length.out = 10) # hold value = 17
var3 <- seq(55, 90, length.out = 10) # hold value = 72

# create combinations between var1 and var2, with var3 held
test_data <- expand.grid(var1 = var1, var2 = var2, var3 = 72)

# same, but for var1 vs. var3, with var2 held
test_data <- rbind(test_data,
    expand.grid(var1 = var1, var2 = 17, var3 = var3))

# create response; analog to using predict() in real life
test_data$resp <- (test_data$var1 + test_data$var2) / test_data$var3

# facet variable placeholder and filling in
test_data$facet <- rep("", nrow(test_data))
test_data[test_data$var2 == 17, "facet"] <- "var1 vs. var3"
test_data[test_data$var3 == 72, "facet"] <- "var1 vs. var2"

# now I melted
test_data2 <- melt(test_data, id.vars = c("var1", "resp", "facet"))

Unfortunately, this left me with a bunch of cases where value was filled in with all of the hold values from var2 and var3, so I had to remove them:

test_data2 <- test_data2[test_data2$value != 72 & test_data2$value != 17, ]

Now, I was able to do this:

ggplot(test_data2, aes(x = var1, y = value, z = resp)) +
    stat_contour() + facet_grid(~ facet)

Got the ballpark I was looking for. Now I guess I need an elegant way to do my combinations and hold values without having ugly results.

enter image description here

Here's an updated version now that I get how to plot on the same axes with rows/columns (since I have two columns and one row, I need the y axis to be the same for both facets which in this case is var1):

ggplot(test_data2, aes(x = value, y = var1, z = resp)) + 
    stat_contour() + facet_grid(~ facet, scales = "free_x")

enter image description here

Hendy
  • 10,182
  • 15
  • 65
  • 71
  • It's unfortunate that you mixed this with shiny code. It doesn't really matter for your real issue. It's unlcear to me, how your input should look like for (let's say) four variables. Regarding your last paragraph: It looks like you want to set the `scales` parameter of `facet_grid`. – Roland Jul 31 '13 at 07:29
  • @Roland Agreed, and sorry about that. Removed shiny-related stuff. I added a conceptual example of the variable interactions to try and illustrate what the data frames would need to look like. Re. scales, it can be done, not in the actual example above; `scales = "free_y"` will do nothing for that case. What I need to do was plot on the same y axis (with `var1`, the common variable) and then use `scales = "free_x"`. I added that plot. – Hendy Jul 31 '13 at 13:33
  • Dead link to the shiny example (from "edit 2"). – r2evans May 03 '18 at 00:16
  • 1
    @r2evans thanks for the heads up. I think I was on a shiny beta server back then and have since migrated to `shinyapps.io`. Updated! – Hendy May 04 '18 at 05:01

1 Answers1

3

I really hope I've understood you correctly.

I created my own example, which actually fits a model.

#some data
set.seed(42)
x1 <- rnorm(20)
x2 <- runif(20)
x3 <- rpois(20,10)
x4 <- rexp(20)
y <- 10 + 2*x1 + 3*x2^2 + 4*x3 +5*x4 + rnorm(20, sd=0.1)

dat <- data.frame(x1, x2, x3, x4, y)

#fit the model
fit <- lm(y~x1+I(x2^2)+x3+x4, data=dat)
summary(fit)

#ranges and fixed values
fix_x <- c(0.3, 0.4, 15, 1)
min_x <- c(-3, 0, 5, 0)
max_x <- c(3, 1, 20, 7)

#all combinations
combis <- combn(seq_len(ncol(dat)-1),2)
#number of x-values 
#(warning! don't make too large since expand.grid is used)
n <- 100

#create new data and predict for each combination
newdat <- lapply(seq_len(ncol(combis)),
                 function(i) {
                   gr <- expand.grid(seq(from=min_x[combis[1,i]],
                                         to=max_x[combis[1,i]],
                                         length.out=n),
                                     seq(from=min_x[combis[2,i]],
                                         to=max_x[combis[2,i]],
                                         length.out=n))

                   newdat <- as.data.frame(matrix(nrow=nrow(gr), ncol=ncol(dat)-1))
                   newdat[,combis[,i]] <- gr
                   newdat[,-combis[,i]] <- matrix(rep(fix_x[-combis[,i]],each=nrow(gr)), nrow=nrow(gr))

                   newdat <- as.data.frame(newdat)
                   names(newdat) <- head(names(dat),-1)

                   newdat$y <- predict(fit, newdata=newdat)

                   newdat$comb <- paste(combis[,i],collapse=" vs. ")
                   #rename so rbind works as needed
                   names(newdat)[combis[,i]] <- c("xa","xb")
                   names(newdat)[-combis[,i]] <- c(paste0("fix",letters[seq_len(ncol(dat)-3)]), "y", "comb")
                   newdat
                 })

newdat <- do.call(rbind,newdat)

#plot
library(ggplot2)
ggplot(newdat, aes(x=xa, y=xb, z=y)) + 
  stat_contour() + 
  facet_wrap(~comb, scales="free", ncol=2) +
  xlab("") +
  ylab("")

enter image description here

Roland
  • 127,288
  • 10
  • 191
  • 288
  • You've understood precisely. A bit more effort than I anticipated, but I love the clever use of a sort of "index table" (an idea I was fiddling with when you posted) and `facet_wrap` instead of `facet_grid` to get around my need to specify things like `var4 vs. var1` (your solution makes order irrelevant). Thanks! – Hendy Jul 31 '13 at 14:52
  • I didn't expect to spend so much effort. It sucked me in. ;) Maybe I can use it myself one day (although I wonder if there really isn't a package that implements something similar). Do you plan to make your shiny app public? – Roland Jul 31 '13 at 14:56
  • One more neat thing I learned from your example: I didn't realize that one could `rbind` data frames with just the same colnames -- I thought they had to be in the same order as well. Brilliant! Unfortunately, I can't make it public. This is for visualizing DOE data interactively to help target optimal conditions. What I *could* do is perhaps find some similar data out there, and create an analog to what I'm doing with non-sensitive data. I thought it was a cool idea to visualize like this, and the interactive bit should be really neat. Many thanks for the help. – Hendy Jul 31 '13 at 15:08
  • I just got my account confirmation after submitting a request to have Shiny beta server access. To repay your generosity, I'll take your model and the code I ended up using and get it up on the public server. I'll come back to post a link in the next day or two so you can see it in action :) – Hendy Aug 03 '13 at 20:29
  • 1
    [The`shiny` app is up](http://spark.rstudio.com/jwhendy/interactive-contour/) using your example model! Many thanks, again. This is great. I created [another question](http://stackoverflow.com/questions/18157975/combine-geom-tile-and-facet-grid-or-facet-wrap-ggplot2) about tiling and facetting, as I'd like to create a tile background with overlaid white contour lines, as it might be easier to see the z values. Not sure. Suggestions on the visualization welcome. – Hendy Aug 10 '13 at 02:39