Visualising a three way interaction between two continuous variables and one categorical variable in R

Question

I have a model in R that includes a significant three-way interaction between two continuous independent variables IVContinuousA, IVContinuousB, IVCategorical and one categorical variable (with two levels: Control and Treatment). The dependent variable is continuous (DV).

model <- lm(DV ~ IVContinuousA * IVContinuousB * IVCategorical)

You can find the data here

I am trying to find out a way to visualise this in R to ease my interpretation of it (perhaps in ggplot2?).

Somewhat inspired by this blog post I thought that I could dichotomise IVContinuousB into high and low values (so it would be a two-level factor itself:

IVContinuousBHigh <- mean(IVContinuousB) + sd (IVContinuousB) 
IVContinuousBLow <- mean(IVContinuousB) - sd (IVContinuousB)

I then planned to plot the relationship between DV and IV ContinuousA and fit lines representing the slopes of this relationship for different combinations of IVCategorical and my new dichotomised IVContinuousB:

IVCategoricalControl and IVContinuousBHigh
IVCategoricalControl and IVContinuousBLow
IVCategoricalTreatment and IVContinuousBHigh
IVCategoricalTreatment and IVContinuousBLow

My first question is does this sound like a viable solution to producing an interpretable plot of this three-way-interaction? I want to avoid 3D plots if possible as I don't find them intuitive... Or is there another way to go about it? Maybe facet plots for the different combinations above?

If it is an ok solution, my second question is how to I generate the data to predict the fit lines to represent the different combinations above?

Third question - does anyone have any advice as to how to code this up in ggplot2?

I posted a very similar question on Cross Validated but because it is more code related I thought I would try here instead (I will remove the CV post if this one is more relevant to the community :) )

Thanks so much in advance,

Sarah

Note that there are NAs (left as blanks) in the DV column and the design is unbalanced - with slightly different numbers of datapoints in the Control vs Treatment groups of the variable IVCategorical.

FYI I have the code for visaualising a two-way interaction between IVContinuousA and IVCategorical:

A<-ggplot(data=data,aes(x=AOTAverage,y=SciconC,group=MisinfoCondition,shape=MisinfoCondition,col = MisinfoCondition,))+geom_point(size = 2)+geom_smooth(method='lm',formula=y~x)

But what I want is to plot this relationship conditional on IVContinuousB....

Seems to me you would need to plot each continuous variables at specific quantiles (perhaps 0.25, 0.5, 0.75 of the other variable and the categorical variable. The "right" way to show this would be with 3d graphics but sadly ggplot2 doesn't do 3d. — IRTFM, Mar 03 '18 at 17:48
`sjPlot` ([here](https://cran.r-project.org/web/packages/sjPlot/index.html) and [here](http://www.strengejacke.de/sjPlot/)) has many convenience functions for model plotting. Many nice vignettes, e.g. a section on three-way interactions [here](https://cran.r.project.org/web/packages/sjPlot/vignettes/plot_interactions.html). See also [`effects package`](https://cran.r-project.org/web/packages/effects/index.html). — Henrik, Mar 03 '18 at 18:56
Thank you @Henrik! The vignette on three-way interactions that you linked doesn't seem to work - could you repost it? — Sarah, Mar 04 '18 at 15:35
OK! I try to paste the 'raw' link: https://cran.r-project.org/web/packages/sjPlot/vignettes/plot_interactions.html — Henrik, Mar 04 '18 at 15:37

eipi10 · Accepted Answer · 2018-03-04T21:39:15.840

Here are a couple of options for visualizing the model output in two dimensions. I'm assuming here that the goal here is to compare Treatment to Control

library(tidyverse)
  theme_set(theme_classic() +
          theme(panel.background=element_rect(colour="grey40", fill=NA))

dat = read_excel("Some Data.xlsx")  # I downloaded your data file

mod <- lm(DV ~ IVContinuousA * IVContinuousB * IVCategorical, data=dat)

# Function to create prediction grid data frame
make_pred_dat = function(data=dat, nA=20, nB=5) {
  nCat = length(unique(data$IVCategorical))
  d = with(data, 
           data.frame(IVContinuousA=rep(seq(min(IVContinuousA), max(IVContinuousA), length=nA), nB*2),
                      IVContinuousB=rep(rep(seq(min(IVContinuousB), max(IVContinuousB), length=nB), each=nA), nCat),
                      IVCategorical=rep(unique(IVCategorical), each=nA*nB)))

  d$DV = predict(mod, newdata=d)

  return(d)
}

`IVContinuousA` vs. `DV` by levels of `IVContinuousB`

The roles of IVContinuousA and IVContinuousB can of course be switched here.

ggplot(make_pred_dat(), aes(x=IVContinuousA, y=DV, colour=IVCategorical)) + 
  geom_line() +
  facet_grid(. ~ round(IVContinuousB,2)) +
  ggtitle("IVContinuousA vs. DV, by Level of IVContinousB") +
  labs(colour="")

You can make a similar plot without faceting, but it gets difficult to interpret as the number of IVContinuousB levels increases:

ggplot(make_pred_dat(nB=3), 
       aes(x=IVContinuousA, y=DV, colour=IVCategorical, linetype=factor(round(IVContinuousB,2)))) + 
  geom_line() +
  #facet_grid(. ~ round(IVContinuousB,2)) +
  ggtitle("IVContinuousA vs. DV, by Level of IVContinousB") +
  labs(colour="", linetype="IVContinuousB") +
  scale_linetype_manual(values=c("1434","11","62")) +
  guides(linetype=guide_legend(reverse=TRUE))

Heat map of the model-predicted difference, DV treatment - DV control on a grid of `IVContinuousA` and `IVContinuousB` values

Below, we look at the difference between treatment and control at each pair of IVContinuousA and IVContinuousB.

ggplot(make_pred_dat(nA=100, nB=100) %>% 
         group_by(IVContinuousA, IVContinuousB) %>% 
         arrange(IVCategorical) %>% 
         summarise(DV = diff(DV)), 
       aes(x=IVContinuousA, y=IVContinuousB)) + 
  geom_tile(aes(fill=DV)) +
  scale_fill_gradient2(low="red", mid="white", high="blue") +
  labs(fill=expression(Delta*DV~(Treatment - Control)))

Thank you so, so much - this is absolutely fantastic. The only thing I am missing when I try it myself are the black borders around each of the panels of the facet (if that makes sense!). Is there anyway to add these? I can paste an image of the output I get in my original question if you need clarification :) Thank you so much again — Sarah, Mar 04 '18 at 11:36
Apologies - forgive one more question! I tried to apply the code for the predict function and then the facets graph to the same data but instead of using IVContinuousB I am using IVContinuousC (I updated the shared data to include IVContinuousC!). When I try this however, I am getting the error code "Error in seq.default(min(IVContinuousC), max(IVContinuousC), : 'from' must be a finite number" after calling the ggplot function - can you think of anyway around this error? Thank you so much again! — Sarah, Mar 04 '18 at 11:57
For the second question, are there missing values in `IVContinuousC`? If so, `min` and `max` functions will return `NA`, which would cause the error you're getting. For example, run `seq(NA, 10, 5)`. To avoid this, add `na.rm=TRUE` as an argument to `min` and `max`. — eipi10, Mar 04 '18 at 21:31
For your first question: The black borders are not part of the `theme_classic()` theme. I added them back in with an extra theme statement in the code I actually ran, but forgot to include that in the code I posted (sorry about that). See the updated `theme_set` statement in my answer, which adds the borders back into the plot panels. — eipi10, Mar 04 '18 at 21:38
(continued) Note that the axis lines are separate from the panel borders (in my answer, note that the panel border lines are a bit lighter and thinner than the axis lines). You can change the axis lines with the `axis.line` theme element (e.g., `axis.line=element_line(colour="red", size=2)`. Type `?theme` for more info on setting theme elements. — eipi10, Mar 04 '18 at 21:40

Daniel · Answer 2 · 2018-03-03T18:39:06.073

If you really want to avoid 3-d plotting, you could indeed turn one of the continuous variables into a categorical one for visualization purposes.

For the purpose of the answer, I used the Duncan data set from the package car, as it is of the same form as the one you described.

library(car)
# the data
data("Duncan")

# the fitted model; education and income are continuous, type is categorical
lm0 <- lm(prestige ~ education * income * type, data = Duncan)

# turning education into high and low values (you can extend this to more 
# levels)
edu_high <- mean(Duncan$education)  + sd(Duncan$education)
edu_low <- mean(Duncan$education)  - sd(Duncan$education)

# the values below should be used for predictions, each combination of the 
# categories must be represented:
prediction_mat <- data.frame(income = Duncan$income, 
                         education = rep(c(edu_high, edu_low),each = 
                         nrow(Duncan)),
                         type = rep(levels(Duncan$type), each = 
                         nrow(Duncan)*2))


predicted <- predict(lm0, newdata = prediction_mat)


# rearranging the fitted values and the values used for predictions
df <- data.frame(predicted,
             income = Duncan$income,
             edu_group =rep(c("edu_high", "edu_low"),each = nrow(Duncan)),
             type = rep(levels(Duncan$type), each = nrow(Duncan)*2))


# plotting the fitted regression lines
ggplot(df, aes(x = income, y = predicted, group = type, col = type)) + 
geom_line() + 
facet_grid(. ~ edu_group)

Visualising a three way interaction between two continuous variables and one categorical variable in R

2 Answers2

`IVContinuousA` vs. `DV` by levels of `IVContinuousB`

Heat map of the model-predicted difference, DV treatment - DV control on a grid of `IVContinuousA` and `IVContinuousB` values

Linked

Visualising a three way interaction between two continuous variables and one categorical variable in R

2 Answers2

IVContinuousA vs. DV by levels of IVContinuousB

Heat map of the model-predicted difference, DV treatment - DV control on a grid of IVContinuousA and IVContinuousB values

Linked

`IVContinuousA` vs. `DV` by levels of `IVContinuousB`

Heat map of the model-predicted difference, DV treatment - DV control on a grid of `IVContinuousA` and `IVContinuousB` values