4

Here's the data I'm working with:

data <- data.frame(id = rep(1:3, each = 30), 
           intervention = rep(c("a","b"),each= 2, times=45),
           area = rep(1:3, times=30), 
           "dv1" = rnorm(90, mean =10, sd=7),
           "dv2" = rnorm(90, mean =5, sd=3),
           outcome = rbinom(90, 1, prob=.5))

data$id <- as.factor(data$id)
data$intervention <- as.factor(data$intervention)
data$area <- as.factor(data$area)
data$outcome <- as.factor(data$outcome)

I'm trying to make sigmoidal plots for this mixed effects logistic regression model:

library(lmer4)
glmer(
  outcome1 ~ dv1 + (1 | id/area), 
  data = data, 
  family = binomial(link = "logit")
)

Here's what I tried and failed with:

library(ggplot2)
ggplot(data, aes(x=dv1, y=outcome1, color=factor(area))) + 
  facet_wrap(~id) +
  geom_point() + 
  stat_smooth(method="glm", method.args=list(family="binomial"), color="black", se=F)

Info    
`geom_smooth()` using formula 'y ~ x'
Warning 
Computation failed in `stat_smooth()`: y values must be 0 <= y <= 1
Computation failed in `stat_smooth()`: y values must be 0 <= y <= 1
Computation failed in `stat_smooth()`: y values must be 0 <= y <= 1

enter image description here

Additionally, is this even the right way to plot logistic regression? Should I be pulling some data from the model itself or is plotting the raw data for illustrative reasons suffice?

myfatson
  • 354
  • 2
  • 14
  • For you last question "or is plotting the raw data for illustrative reasons suffice" could you explain what you want to illustrate and to whom? It depends on the target audience as well as your own skill level. If you were to show this to me, I would immediately ask "What are the useful aspects of this plot? What can it be used for?". The answer is that it is only really useful to show a regression line. To visualize binary data, using box-, violin- or swarmplots is often more informative, or if you wish to choose variable transformations, plot smoothers using the linear predictor scale. – Oliver Feb 06 '21 at 09:39
  • @Oliver The goal would be to illustrate the distribution of the raw fit prior to being fit to the regression model. If there are obvious differences in the distribution of the data then finding a significant effect during regression feels more parsimonious and intuitive. I'm a early career biomed research scientist, my target audience is the professional scientific community. I'm no bio-statistician nor epidemiologist, so thanks for the recommendation to use box-, violing- or swarmplots instead! – myfatson Feb 06 '21 at 19:16
  • Glad I could come with some recommendations. I would suggest looking at some of the research papers out there, and how they visualize their data. The general idea often comes down to using the same methodology that we would use prior to estimating a `glm` model. The extra bit comes from visualizing some smoothers across the independent groupings of our random effects. Now binary data is of course always troublesome to visualize in a intuitive way. And sadly it quickly becomes difficult for mange groupings. Ps. I'm in finance, so there's no need to be a biostatistician to know mixed models. ;-) – Oliver Feb 06 '21 at 20:20

1 Answers1

1

It looks ok to me (not an expert) - I think the issue is that your sample data isn't particularly 'logistic' (i.e. the spread of dv1 isn't logistically related to outcome). If you modify the sample data, e.g.

library(tidyverse)
#install.packages("lme4")
library(lme4)
set.seed(123)
data <- data.frame(id = rep(1:3, each = 30), 
                   intervention = rep(c("a","b"), each= 2, times=45),
                   area = rep(1:3, times = 30), 
                   "dv1" = rep(c(rnorm(15, mean = 20, sd = 7),
                                 rnorm(15, mean = 40, sd = 7)), times = 3),
                   "dv2" = rep(c(rnorm(15, mean = 20, sd = 7),
                                 rnorm(15, mean = 40, sd = 7)), times = 3),
                   outcome = rep(c(rbinom(15, 0, prob = .95),
                                   rbinom(15, 1, prob = .95)), times = 3))

data$id <- as.factor(data$id)
data$intervention <- as.factor(data$intervention)
data$area <- as.factor(data$area)
data$outcome <- as.factor(data$outcome)

model_1 <- glmer(
  outcome ~ dv1 + (1 | id/area), 
  data = data, 
  family = binomial(link = "logit")
)

library(ggplot2)
ggplot(data, aes(x = dv1, y = as.numeric(outcome) - 1, color = factor(area))) +
  stat_smooth(method="glm", color="black", se=FALSE,
              method.args = list(family=binomial)) + 
  geom_point() +
  facet_wrap(~id)

The plot looks more like you'd expect:

example_1.png

(Note: these three panels are the same, as I repeated the sample data 3 times, but you get the idea)

If you want to plot the model predictions, this tutorial gives a straightforward overview: https://mgimond.github.io/Stats-in-R/Logistic.html

jared_mamrot
  • 22,354
  • 4
  • 21
  • 46