0

I am currently working on fitting a glm() on a dataset with a proportion variable (C3_proportion) along a continuous gradient (elevation) and I am trying to obtain the x values for y=c(0.25,0.5,1). After some research, it seems like a binomial glm would be my best option. So here are my data, the model, and the ggplot2 plot I was able to come up with.

# Library load
library(ggplot2)
library(ggpmisc)

# Example data
df <- data.frame("mountain"="Sumapaz",
                 "C3_proportion"=c(0.2692308,0.2692308,0.2692308,0.2692308,0.3090909,0.3333333,0.3333333,0.3333333,0.3333333,0.3333333,0.3333333,0.3333333,0.3333333,0.3333333,0.3461538,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.3555556,0.3939394,0.3939394,0.3939394,0.3939394,0.3939394,0.3939394,0.3939394,0.3939394,0.3939394,0.4473684,0.4827586,0.4827586,0.4827586,0.4827586,0.4827586,0.4827586,0.4827586,0.4827586,0.4827586,0.6341463,0.7058824,0.7058824,0.7058824,0.7058824,0.7058824,0.7058824,0.7058824,0.7058824,0.7058824,0.7674419,0.8666667,0.8666667,0.8666667,0.8666667,0.8666667,0.8666667,0.8666667,0.8666667,0.8666667,0.875,0.8636364,0.8636364,0.8636364,0.8636364,0.8636364,0.8636364,0.8636364,0.8636364,0.8636364,0.8636364),
                 "elevation"=c(300,350,400,450,500,550,600,650,700,750,800,850,900,950,1000,1050,1100,1150,1200,1250,1300,1350,1400,1450,1500,1550,1600,1650,1700,1750,1800,1850,1900,1950,2000,2050,2100,2150,2200,2250,2300,2350,2400,2450,2500,2550,2600,2650,2700,2750,2800,2850,2900,2950,3000,3050,3100,3150,3200,3250,3300,3350,3400,3450,3500,3550,3600,3650,3700,3750,3800,3850,3900,3950,4000))

# Model
fit=glm(C3_proportion~poly(elevation, 2),data=df, family = "binomial")

# Plot 
ggplot(df, aes(x=elevation, y=C3_proportion)) +
  geom_point(aes(colour=mountain),alpha = 0.8) +
  geom_smooth(method = "glm", method.args = list(family = "binomial"), formula= y ~ poly(x,2), se = F,lwd=0.5, colour="black")+
  stat_smooth(method = "glm", method.args = list(family = "binomial"), formula= y ~ poly(x,2), colour = "black", lty="dotted", geom = "ribbon", fill = NA)+
  xlab("Elevation (m)") + ylab("Proportion of C3 genera")

ggplot2 scatterplot with C3 Poaceae proportion along the elevation gradient for Mount Sumapaz, Colombia

However, whenever I try to predict values, I cant run it for the y axis,

> predict(fit, newdata = data.frame(C3_proportion=c(0.25,0.5,1)))
Error in poly(elevation, 2, coefs = list(alpha = c(2150, 2150), norm2 = c(1,  : 
  object 'elevation' not found

...and even when I tried to do so with the x values, the predicted elevations are not consistent with the ones shown by the fitted curve on the plot.

# Predict values
predict(fit, newdata = data.frame(elevation=c(1000,2000,3000)))
#1           2           3 
#-0.88574136  0.07365174  1.03304484 
Vinícius Félix
  • 8,448
  • 6
  • 16
  • 32
  • Look up the parameter 'type' for the predict function – Dason Oct 02 '21 at 14:45
  • Thanks, @Dason. Setting the parameter type="response" solves the inconsistency between plotted and fitted values. Any hints on how to predict the values for x using the y_axis? Should I simply switch orders in the glm call? – J. D. Vidal Oct 02 '21 at 14:56
  • Switching won't work. But there are numerical ways to just invert or approximate the values you're talking about – Dason Oct 02 '21 at 15:20

1 Answers1

0

With a generalised linear model, predictions can be on the scale of the response or the linear predictor (type = "link"). In R, the default is the scale of the linear predictor. The alternative, "response", is on the scale of the response variable. Thus for a default binomial model the default predictions are of log-odds (probabilities on logit scale) and type = "response" gives the predicted probabilities.

Robert Long
  • 5,722
  • 5
  • 29
  • 50