2

I am using the caret package with nnet method. I get different results when I change maxit parameter from 300 to 500. My understanding is, if maxit is increased, the model will undergo maximum of "n" iterations to find the local minima.

In my case, I get good result when I set maxit as 300 and not when it is 500.

Note: seed value, tune_grid, no of folds are same in both models.

1) I get different results because many local minima in NN optimization?

2) Higher the maxit, better the model - True or False? (underlying assumption is, if model is not converged with 300 iteration, it will converge when iteration is increased)

3) How to tune the maxit parameter?

jmuhlenkamp
  • 2,102
  • 1
  • 14
  • 37
Yuva
  • 21
  • 2

1 Answers1

1

The nnet method you've specified is using iterative optimisation (BFGS method from optim() function in base R) to estimate parameters for the model [1]. The optimisation should stop when it converges. If maxit is set too low then the model will fail to converge.

The BFGS method is not guaranteed to converge for all optimisation problems. Nonetheless it is regarded as a good optimisation method. The optimisation surface is data dependent so I won't comment on the number or nature of minima for your case. You may have hit a local minima at 300 iterations but there is some stochasticity in the nnet() function (setting random weights) so subsequent runs may differ even if all nnet() parameters are identical. Note the difference between the two subsequent nnet() runs with identical parameters - 4.115351 versus 2.112400 at 100 iterations.

library(nnet)
data(iris)
set.seed(42)

nnet(Species ~ ., data=iris, size=10)
# weights:  83
initial  value 262.654300
iter  10 value 72.296066
iter  20 value 10.287034
iter  30 value 6.341659
iter  40 value 5.814649
iter  50 value 5.187836
iter  60 value 4.199448
iter  70 value 4.150082
iter  80 value 4.122058
iter  90 value 4.117969
iter 100 value 4.115351
final  value 4.115351
stopped after 100 iterations
a 4-10-3 network with 83 weights
inputs: Sepal.Length Sepal.Width Petal.Length Petal.Width
output(s): Species
options were - softmax modelling

# Deliberately not setting seed value before second nnet run
nnet(Species ~ ., data=iris, size=10)
# weights:  83
initial  value 201.869745
iter  10 value 67.631035
iter  20 value 11.863275
iter  30 value 6.542750
iter  40 value 5.758701
iter  50 value 5.355368
iter  60 value 3.970210
iter  70 value 2.835171
iter  80 value 2.414463
iter  90 value 2.226375
iter 100 value 2.112400
final  value 2.112400
stopped after 100 iterations
a 4-10-3 network with 83 weights
inputs: Sepal.Length Sepal.Width Petal.Length Petal.Width
output(s): Species
options were - softmax modelling

Also note, that neither of the nnet() runs above have converged. Here is an example of a converged model:

set.seed(42)
nnet(Species ~ ., data=iris, size=10, maxit=500)
# weights:  83
initial  value 262.654300
iter  10 value 72.296066
iter  20 value 10.287034
# I've truncated the output here
iter 360 value 0.000277
iter 370 value 0.000117
final  value 0.000097
converged
a 4-10-3 network with 83 weights
inputs: Sepal.Length Sepal.Width Petal.Length Petal.Width
output(s): Species
options were - softmax modelling

Note, "converged" in the output above.

Unfortunately, it's not possible to tune the maxit parameter using the tune_grid option to the caret train function. It's probably reasonable to set a high value for maxit in the train call but I won't recommend a value because again it's data-dependent. For the iris data I'd try a value which is an order of magnitude, or two, higher than the largest number of iterations which converged. Alternatively, you could loop over values for maxit:

num.it <- 500 # max number of training iterations     
fit.dat <- matrix(ncol=1, nrow=num.it) # fitting criterion values

for(i in 1:num.it) {

    # to monitor progress
    cat(i,'\n') 
    flush.console()

    # to ensure same set of random starting weights are used each time
    set.seed(42)

    # temporary nnet model
    mod.tmp <- nnet(Species ~ ., data=iris, size=10, maxit=i, trace=F)

    # append fitting criterion value
    fit.dat[i,] <- mod.tmp$value             
}

# extract convergence values
which.min(fit.dat)
[1] 375
fit.dat[which.min(fit.dat)]
[1] 9.654717e-05

# plot fitting values
plot(fit.dat, type='l')

The above loop tunes maxit but doesn't take over-fitting into account. A better approach would be to use the caret train() function with your current tune_grid and cross-validation settings. You'd also have to check the caret train() function output for convergence.

Also, caret and other packages may have surprising reproducibility issues with set.seed(): R: set.seed() results don't match if caret package loaded

Finally, it's unlikely to help but it may be interesting to look at the seeds option to caret trainControl() function. As the docs say it's probably only useful when running parallel jobs.

[1] https://cran.r-project.org/web/packages/nnet/nnet.pdf

makeyourownmaker
  • 1,558
  • 2
  • 13
  • 33