Hello folks!
I fitted a multivariate (bias reduced) logistic regression in R, using brglm::brglm
. I've been trying to produce a good looking plot that shows
- the propability of (y=1) along the factor x1 (line)
- an 0.95 confidence interval
- separate histograms for y=0 and y=1
I tried using popbio::log.hist.plot
, but I didn't manage to add a confidence interval and didn't achieve movement using par(mar, oma, mgp, mai)
, when grouping a number of plots.
Then I tried visreg::visreg
, which creates good looking plots with seperated rugs, but has no option of adding histograms.
I found ggplot2: How to combine histogram, rug plot, and logistic regression prediction in a single graph, but again I wasn't able to add a conficence interval to the ggplot
-graph.
At last I combined visreg
and ggplot
using some lines from the link above together with some pretty crude code of mine:
library(brglm)
library(plyr)
library(dplyr)
library(visreg)
library(ggplot2)
# creating random data
set.seed(60)
df <- cbind(y <- c(floor(runif(30, min=0, max=1.2)),
floor(runif(50, min=0, max=2))),
x1 <- c(runif(30, min=0, max=50), runif(50, min=20, max=100)),
x2 <- runif(80, min=0, max=100),
x3 <- runif(80, min=0, max=100))
colnames(df) <- c("y", "x1", "x2", "x3")
df <- data.frame(df)
# fitting the model
mod <- brglm(y ~ x1 + x2 + x3, binomial, data=df)
# setting breaks for histograms
h = df %>% group_by(y) %>%
mutate(breaks = cut(x1, breaks=seq(from= 0, to= 105, by=5),
labels=seq(from= 0, to= 100, by= 5),
include.lowest=TRUE),
breaks = as.numeric(as.character(breaks))) %>%
group_by(y, breaks) %>%
summarise(n = n()) %>%
mutate(ext = ifelse(y==0, n/sum(n), 1 - n/sum(n)))
# splitting h for y = 0 or 1, setting mins and maxs
h.split <- split(h, h$y)
min1_p <- min((h.split$"1")$ext)
max1_n <- max((h.split$"1")$n)
max0_p <- max((h.split$"0")$ext)
max0_n <- max((h.split$"0")$n)
# plottin visreg with hist
visreg (mod, "x1", ylab=NULL, xlab= "x1",
line=list(col="black", size=2), gg=T, scale="response") +
geom_segment(data=h, size=5, show.legend=FALSE,
aes(x=breaks+2.5, xend=breaks+2.5, y=y, yend=ext))+
scale_y_continuous(name = expression("p(y=1)"), limits = c(0, 1),
sec.axis = sec_axis(~ . +0 , name = "n", breaks = c(
seq(from=(max0_p/max0_n),
to=max0_p,
by=(max0_p/max0_n)),
seq(from= min1_p,
to=1-(1-min1_p)/max1_n,
by=(1-min1_p)/max1_n)),
labels = c(
seq(from=1, to=max0_n, by=1),
rev(seq(from=1, to=max1_n,
by=1)))
)) +
scale_x_continuous(limits=c(min(x1)-1,max(x1))) +
theme(panel.background = element_rect(fill = "white", colour = "grey"))
The resulting plot looks like this visreg + confedence interval + histograms with ggplot2
My questions are:
- How do I set the breaks and labels to achive histogram pillars with a width of x=10?
- Why is the secondary y-axis scaled differend for y=0 & 1 and how can I change that?
- Is there a better way to do all this?
Thank you for your attention,
Lanni