How can I find non-linear regression model starting values?

Question

I'm trying to fit a non-linear tree diameter height model (Max & Burkhart, 1976) to my data set (consists of D, breast height diameter (cm); H, total tree height (m); hi section height from ground level, di diameter at hi level etc.) in R.

I'm having trouble on fitting the model. I think it's because of the starting parameter values of the equation. I get "NaNs produced" errors. I tried to tweak the starting parameters. The number of errors decreased to 1 but not zero. So I need to find a way to estimate starting parameters for a non-linear regression model. I searched for Self starting models but could not apply to my equation because of complexity of the equation and my lack of knowledge. I will add all my data set here so you guys maybe show me a way.

By the way, I'm not sure if I can attach files to my question, so I will give a link to my dataset for anyone who wants to view or download. I uploaded my data to google drive and the link is https://drive.google.com/file/d/1q7W1bUcx4sK2G2QPte7ZtCudSLfBxpet/view?usp=sharing

# Function to compute Max & Burkhart (1976) equation
ComputeDi.MaxBurkhart <- function(hi, d, h, b1, b2, b3, b4, a1, a2){
    x <- hi / h
    x1 <- x - 1 
    x2 <- x ^ 2 - 1
    di <- d * sqrt(b1 * x1 + b2 * x2 + b3 * (a1 - x) ^ 2 * ((a1 - x) >= 0.0) + b4 * (a2 - x) ^ 2 * ((a2 - x) >= 0.0))
    return(di)
}

# Set the working directory
setwd("../Data")

# Load data and rename some variables
sylvestris <- read.csv("mydata.csv")

# Global fitting
nlmod.fp.di <- nls(di ~ ComputeDi.MaxBurkhart(hi, d, h, b1, b2, b3, b4, a1, a2), data = sylvestris, start = c(b1 = -2.53, b2 = 1.2, b3 = -1.5, b4 = 22, a1 = 0.72, a2 = 0.15

), control = nls.control(tol = 1e-07))
summary(nlmod.fp.di, correlation = T)

It's all OK until here. I'm getting Nan Errors after here!

# Set seed and select names of trees
trees <- unique(sylvestris$tree) 
set.seed(15)
result.list <- list()
i <- 1
while(length(trees) > 0){
    tree.smp <- sample(trees, 10, replace = F)
    sylvestris.smp <- sylvestris[sylvestris$tree %in% tree.smp, ]
    fitting.ols <- try(nls(di ~ ComputeDi.MaxBurkhart(hi, d, h, b1, b2, b3, b4, a1, a2), data = sylvestris.smp, start = c(b1 = -2.53, b2 = 1.2, b3 = -1.5, b4 = 22, a1 = 0.72, a2 = 0.15

), control = nls.control(tol = 1e-07)), silent = T)
    if(class(fitting.ols)[1] == "try-error"){
            fit.smp <- data.frame(trees = paste(tree.smp, collapse = "_"), 
t(rep(NA, 8)))
            names(fit.smp) <- c("trees", "b1", "b2", "b3", "b4", "a1", 
"a2", "NS", "RSE")
    } else {
            nlmod.ols <- fitting.ols
            fit.smp <- data.frame(trees = paste(tree.smp, collapse = "_"), t(coef(fitting.ols)), NS = sum(summary(fitting.ols)$parameters[, 4] > 0.05), RSE = summary(fitting.ols)$sigma)
    }
    result.list[[i]] <- fit.smp
    i <- i + 1
    trees <- trees[!trees %in% tree.smp]        
}

I expect significant parameter estimations without any NaN errors. I'm sure the problem is about the starting values because this code block works perfect with another data set. When I changed the data, I get this errors. Thank you in advance.

You can provide sample data with `dput` as links to drive may not be accessible by all and/or may expire in the future. — NelsonGon, Apr 26 '19 at 11:54
The nls2 package provides a brute force method and other methods which can be used to to find starting values. — G. Grothendieck, Apr 26 '19 at 14:51

Jet · Answer 1 · 2019-04-26T12:28:54.027

0

You can try to use the package nls.multstart, which is made to simplify the estimation of starting values.

You can basically specify ranges of starting parameters, and the regression will be made using the best parameters, based on AIC score.

edited Apr 26 '19 at 12:28

answered Apr 26 '19 at 12:23

Jet

650
5
17

Actually I tried to used nls.mutistart package, after seeing your comment. However, with the lack of my knowledge in R, I wasn't able to make it work. Can you please show me how to use it? I'm really sorry about my basic questions. But I'm a newbie in R and need some assistance. Thanks in advance. – Onur Alkan May 02 '19 at 19:33
Sorry but I have never used it myself yet. From what I have seen, the function ǹls_multstart` is made to work just like the base `nls` function. Instead of specifying the starting values with the argument `start`, you can specify a broad range of values by specifying lower and upper bounds, respectively with `start_lower` and `start_upper`. I cannot help you more on that sorry. – Jet May 03 '19 at 12:36

How can I find non-linear regression model starting values?

1 Answers1