2

I'm extremely stuck at the moment as I am trying to figure out how to calculate the probability from my glm output in R. I know the data is very insignificant but I would really love to be shown how to get the probability from an output like this. I was thinking of trying inv.logit() but didn't know what variables to put within the brackets.

The data is from occupancy study. I'm assessing the success of a hair trap method versus a camera trap in detecting 3 species (red squirrel, pine marten and invasive grey squirrel). I wanted to see what affected detection (or non detection) of the various species. One hypotheses was the detection of another focal species at the site would affect the detectability of red squirrel. Given that pine marten is a predator of the red squirrel and that the grey squirrel is a competitor, the presence of those two species at a site might affect the detectability of the red squirrel.

Would this show the probability? inv.logit(-1.14 - 0.1322 * nonRS events)

 glm(formula = RS_sticky ~ NonRSevents_before1stRS, family = binomial(link = "logit"), data = data)
Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.7432  -0.7432  -0.7222  -0.3739   2.0361  
Coefficients:
                    Estimate Std. Error z value Pr(>|z|)  
(Intercept)              -1.1455     0.4677  -2.449   0.0143 *
NonRSevents_before1stRS  -0.1322     0.1658  -0.797   0.4255  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
   (Dispersion parameter for binomial family taken to be 1)
Null deviance: 34.575  on 33  degrees of freedom
   Residual deviance: 33.736  on 32  degrees of freedom
  (1 observation deleted due to missingness)
   AIC: 37.736
  Number of Fisher Scoring iterations: 5*
Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
Cris
  • 41
  • 1
  • 1
  • 3
  • So, what have you tried so far / what exactly is it you are stuck on ? – erocoar Jan 06 '18 at 19:54
  • Sorry, I've never posted on this before so I realise I didn't phrase this very well at all. – Cris Jan 06 '18 at 19:56
  • So the glm output is showing the effect of other species in a site on the detection of red squirrels. And I know the data is showing me that the presnce of other species has no significance on the detection of red squirrel. I figured I should show the probability of detecting red squirrel. I tried to use inverse log on R but wasn't sure what variable to use with it. I am very new to GLMs (Im sure thats painfully obvious) so apologies if this is a very stupid question – Cris Jan 06 '18 at 19:59
  • You could use invlog on the fitted values of the glm object, or on your own predicted Y values – erocoar Jan 06 '18 at 20:17
  • If you're using a monitoring station over some time period, this might be modeled as a Poisson process: what is the arrival rate of red squirrels over some period of time? – Len Greski Jan 06 '18 at 20:17
  • a friend advised me that in order to get probability I should do an inv.log (-1.14 - 0.1322). Would these be the right coefficients? – Cris Jan 06 '18 at 21:01
  • That last comment should be copied and pasted into the question and then deleted. Learn to use the [edit] facility. (And read the rest of [help].) – IRTFM Jan 06 '18 at 22:25

2 Answers2

8

If you want to predict the probability of response for a specified set of values of the predictor variable:

pframe <- data.frame(NonRSevents_before1stRS=4)
predict(fitted_model, newdata=pframe, type="response")

where fitted_model is the result of your glm() fit, which you stored in a variable. You may not be familiar with the R approach to statistical analysis, which is to store the fitted model as an object/in a variable, then apply different methods to it (summary(), plot(), predict(), residuals(), ...)

  • This is obviously only a made-up example: I don't know if 4 is a reasonable value for the NonRSevents_before1stRS variable)
  • you can specify more different values to do predictions for at the same time (data.frame(NonRSevents_before1stRS=c(4,5,6,7,8)))
  • if you have multiple predictors, you have to specify some value for every predictor for every prediction, e.g. data.frame(x=4:8,y=mean(orig_data$y), ...)

If you want the predicted probabilities for the observations in your original data set, just predict(fitted_model, type="response")

You're correct that inv.logit() (from a bunch of different packages, don't know which you're using) or plogis() (from base R, essentially the same) will translate from the logit or log-odds scale to the probability scale, so

plogis(predict(fitted_model))

would also work (predict provides predictions on the link-function [in this case logit/log-odds] scale by default).

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • That's great Ben, thanks so much, for the "fitted_model", what would the input be there? Apologies for the dumb questions but my brain is feeling fried – Cris Jan 07 '18 at 00:04
7

The dependent variable in a logistic regression is a log odds ratio. We'll illustrate how to interpret the coefficients with the space shuttle autolander data from the MASS package.

After loading the data, we'll create a binary dependent variable where:

1 = autolander used, 
0 = autolander not used. 

We will also create a binary independent variable for shuttle stability:

1 = stable positioning
0 = unstable positioning. 

Then, we'll run glm() with family=binomial(link="logit"). Since the coefficients are log odds ratios, we'll exponentiate them to turn them back into odds ratios.

library(MASS)
str(shuttle)
shuttle$stable <- 0
shuttle[shuttle$stability =="stab","stable"] <- 1
shuttle$auto <- 0
shuttle[shuttle$use =="auto","auto"] <- 1

fit <- glm(use ~ factor(stable),family=binomial(link = "logit"),data=shuttle) # specifies base as unstable

summary(fit)
exp(fit$coefficients)

...and the output:

> fit <- glm(use ~ factor(stable),family=binomial(link = "logit"),data=shuttle) # specifies base as unstable
> 
> summary(fit)

Call:
glm(formula = use ~ factor(stable), family = binomial(link = "logit"), 
data = shuttle)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.1774  -1.0118  -0.9566   1.1774   1.4155  

Coefficients:
                  Estimate Std. Error z value Pr(>|z|)  
(Intercept)      4.747e-15  1.768e-01   0.000   1.0000  
factor(stable)1 -5.443e-01  2.547e-01  -2.137   0.0326 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 350.36  on 255  degrees of freedom
Residual deviance: 345.75  on 254  degrees of freedom
AIC: 349.75

Number of Fisher Scoring iterations: 4

> exp(fit$coefficients)
    (Intercept) factor(stable)1 
      1.0000000       0.5802469 
> 

The intercept of 0 is the log odds for unstable, and the coefficient of -.5443 is the log odds for stable. After exponentiating the coefficients, we observe that the odds of autolander use under the condition of an unstable shuttle 1.0, and are multiplied by .58 if the shuttle is stable. This means that the autolander is less likely to be used if the shuttle has stable positioning.

Calculating probability of autolander use

We can do this in two ways. First, the manual approach: exponentiate the coefficients and convert the odds to probabilities using the following equation.

p = odds / (1 + odds) 

With the shuttle autolander data it works as follows.

# convert intercept to probability
odds_i <- exp(fit$coefficients[1])
odds_i / (1 + odds_i)
# convert stable="stable" to probability
odds_p <- exp(fit$coefficients[1]) * exp(fit$coefficients[2])
odds_p / (1 + odds_p)

...and the output:

> # convert intercept to probability
> odds_i <- exp(fit$coefficients[1])
> odds_i / (1 + odds_i)
(Intercept) 
        0.5 
> # convert stable="stable" to probability
> odds_p <- exp(fit$coefficients[1]) * exp(fit$coefficients[2])
> odds_p / (1 + odds_p)
(Intercept) 
  0.3671875 
>

The probability of autolander use when a shuttle is unstable is 0.5, and decreases to 0.37 when the shuttle is stable.

The second approach to generate probabilities is to use the predict() function.

# convert to probabilities with the predict() function
predict(fit,data.frame(stable="0"),type="response")
predict(fit,data.frame(stable="1"),type="response")

Note that the output matches the manually calculated probabilities.

> # convert to probabilities with the predict() function
> predict(fit,data.frame(stable="0"),type="response")
  1 
0.5 
> predict(fit,data.frame(stable="1"),type="response")
        1 
0.3671875 
> 

Applying this to the OP data

We can apply these steps to the glm() output from the OP as follows.

coefficients <- c(-1.1455,-0.1322)
exp(coefficients)
odds_i <- exp(coefficients[1])
odds_i / (1 + odds_i)
# convert nonRSEvents = 1 to probability
odds_p <- exp(coefficients[1]) * exp(coefficients[2])
odds_p / (1 + odds_p)
# simulate up to 10 nonRSEvents prior to RS
coef_df <- data.frame(nonRSEvents=0:10,
                  intercept=rep(-1.1455,11),
                  nonRSEventSlope=rep(-0.1322,11))
coef_df$nonRSEventValue <- coef_df$nonRSEventSlope * 
coef_df$nonRSEvents
coef_df$intercept_exp <- exp(coef_df$intercept)
coef_df$slope_exp <- exp(coef_df$nonRSEventValue)
coef_df$odds <- coef_df$intercept_exp * coef_df$slope_exp
coef_df$probability <- coef_df$odds / (1 + coef_df$odds)
# print the odds & probabilities by number of nonRSEvents
coef_df[,c(1,7:8)]

...and the final output.

> coef_df[,c(1,7:8)]
   nonRSEvents    odds probability
1            0 0.31806     0.24131
2            1 0.27868     0.21794
3            2 0.24417     0.19625
4            3 0.21393     0.17623
5            4 0.18744     0.15785
6            5 0.16423     0.14106
7            6 0.14389     0.12579
8            7 0.12607     0.11196
9            8 0.11046     0.09947
10           9 0.09678     0.08824
11          10 0.08480     0.07817
> 
Len Greski
  • 10,505
  • 2
  • 22
  • 33