0

I am trying to model the relationship between my response 'crop coverage [%]' ~ weed coverage [%] + Soil Moisture [%] using R. Since I am dealing with proportions, I chose to do a beta regression. I was told that in order to better fit and visualize the model, using the mean of weed_coverage would be a good idea. However, when I do that, I get following error:

betareg (crop_coverage ~ soil_moisture + weed_coverage_mean, data = df) -> model_a

Error in optim(par = start, fn = loglikfun, gr = gradfun, method = method, : non-finite value supplied by optim

Why do I get this error? And is that really the best way to fit and to visualize this model, since weed_coverage and soil_moisture are both continuous variables? Thanks a lot in advance.

My data:

df <- structure(list(date = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 4L, 4L, 4L, 
4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L), .Label = c("2021-03-17", 
"2021-04-07", "2021-04-13", "2021-04-27", "2021-05-11", "2021-05-27"
), class = "factor"), weed_coverage = c(0, 0, 0, 1.7, 1, 5, 0, 
0, 0.1, 0.2, 1, 2.8, 2.5, 1, 1, 5, 0, 0, 0.9, 0.7, 0, 1.1, 0.5, 
0.5, 0, 0, 0.5, 4, 0, 0.3, 0.8, 4, 1, 2, 2, 6, 0.2, 5, 0, 0, 
3, 1, 0, 2, 0, 0, 0, 3, 3, 0), soil_moisture = c(36.28, 37.6, 
38.55, 34.38, 34.02, 34.88, 34.92, 38.12, 35.38, 36.92, 27.15, 
24.95, 21.38, 22.95, 27.65, 25.7, 27.02, 32.1, 27.18, 26, 14.97, 
15.25, 17.02, 16.12, 15.32, 14.3, 14.5, 12.45, 13.07, 15.4, 14.9, 
12, 16.85, 17.15, 18.52, 10.68, 13.82, 9.5, 15.32, 10.97, 14.8, 
17.05, 26.75, 14.8, 25.75, 19.18, 18.12, 14.22, 18.95, 24.38), 
    crop_coverage = c(0.38, 0.6, 0.75, 0.5, 0.4, 0.48, 0.74, 
    0.75, 0.27, 0.45, 0.65, 0.3, 0.4, 0.38, 0.45, 0.58, 0.48, 
    0.75, 0.58, 0.4, 0.9999, 0.7, 0.75, 0.7, 0.85, 0.78, 0.7, 
    0.91, 0.2, 0.6, 0.95, 0.85, 0.6, 0.7, 0.75, 0.9, 0.8, 0.85, 
    0.75, 0.96, 0.85, 0.85, 0.75, 0.73, 0.68, 0.7, 0.97, 0.7, 
    0.75, 0.74), weed_coverage_mean = c(1.256, 1.256, 1.256, 
    1.256, 1.256, 1.256, 1.256, 1.256, 1.256, 1.256, 1.256, 1.256, 
    1.256, 1.256, 1.256, 1.256, 1.256, 1.256, 1.256, 1.256, 1.256, 
    1.256, 1.256, 1.256, 1.256, 1.256, 1.256, 1.256, 1.256, 1.256, 
    1.256, 1.256, 1.256, 1.256, 1.256, 1.256, 1.256, 1.256, 1.256, 
    1.256, 1.256, 1.256, 1.256, 1.256, 1.256, 1.256, 1.256, 1.256, 
    1.256, 1.256)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-50L))
Effigy
  • 145
  • 7
  • 1
    You can't fit a model with a constant value for one of the predictors (it will be confounded with the model intercept). If you can clarify/explain the statement "I was told that in order to better fit and visualize the model, using the mean of weed_coverage would be a good idea" a little more that might be helpful ... – Ben Bolker Mar 11 '22 at 16:06
  • Hey Ben, check out the first comment by Achim Zeileis of the answer to a similar question of me on cross validated. Here is the link. https://stats.stackexchange.com/questions/566640/beta-regression-shows-a-weird-plot I have no idea how to do it. My code so far is: ggplot(df, aes(x = soil_moisture, y = crop_coverage)) + geom_point(size = 2, shape = 21)+ geom_line(aes(y = predict(model_a, df)), color = "blue") – Effigy Mar 11 '22 at 16:25
  • Achim Zeleis means you should use the mean of weed cover just when **plotting the predicted values**, not when fitting the model ... – Ben Bolker Mar 11 '22 at 16:39
  • Thanks. How do I do that? Where in the code do I put it? – Effigy Mar 11 '22 at 16:42

1 Answers1

1

You should fit the model with the original data, then use the mean value of the non-focal predictors when predicting values to use in the plot. For example:

fit

library(betareg)
m <- betareg (crop_coverage ~ soil_moisture + weed_coverage, data = df)

construct prediction frame and predict

pframe <- with(df, data.frame(soil_moisture = seq(min(soil_moisture),
                                                  max(soil_moisture),
                                                  length.out = 50),
                              weed_coverage = mean(weed_coverage)))
pframe$crop_coverage <- predict(m, newdata = pframe, type = "response")

plot

plot(crop_coverage ~ soil_moisture, data = df)
with(pframe, lines(soil_moisture, crop_coverage))

observed data with predicted line

You can do fancier stuff if you want like use expand.grid() to generate a prediction frame that computes the predicted values for several different levels of weed coverage (if you're going to do this you may want to move to ggplot2 for plotting).

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453