0

I am having trouble extracting caret's finalModel-parameters for nnet. If I use the - in my mind - exactly same parameters for caret::train and nnet::nnet, I get (sometimes) big differences. Have I forgotten a parameter or is this due to the computation-algorithm of the neural network? I am aware that I can use predict for caret_net (in the example below), but I still would like to reproduce the results only with nnet.

Example:

library(nnet)
library(caret)

len <- 100
set.seed(4321)
X <- data.frame(x1 = rnorm(len, 40, 25), x2 = rnorm(len, 70, 4), x3 = rnorm(len, 1.6, 0.3))
y <- 20000 + X$x1 * 3 - X$x1*X$x2 * 4 - (X$x3**4) * 7 + rnorm(len, 0, 4)
XY <- cbind(X, y)

# pre-processing
preProcPrms <- preProcess(XY, method = c("center", "scale"))
XY_pre <- predict(preProcPrms, XY)

# caret-nnet
controlList <- trainControl(method = "cv", number = 5)
tuneMatrix <- expand.grid(size = c(1, 2), decay = c(0, 0.1))

caret_net <- train(x = XY_pre[ , colnames(XY_pre) != "y"],
                   y = XY_pre[ , colnames(XY_pre) == "y"],
                   method = "nnet",
                   linout = TRUE,
                   TRACE = FALSE,
                   maxit = 100,
                   tuneGrid = tuneMatrix,
                   trControl = controlList)

# nnet-nnet
nnet_net <- nnet(x = XY_pre[ , colnames(XY_pre) != "y"],
                 y = XY_pre[ , colnames(XY_pre) == "y"],
                 linout = caret_net$finalModel$param$linout,
                 TRACE = caret_net$finalModel$param$TRACE,
                 size = caret_net$bestTune$size,
                 decay = caret_net$bestTune$decay,
                 entropy = caret_net$finalModel$entropy,
                 maxit = 100)

# print
print(caret_net$finalModel)
print(nnet_net)

y_caret <- predict(caret_net$finalModel, XY_pre[ , colnames(XY_pre) != "y"])
y_nnet <- predict(nnet_net, XY_pre[ , colnames(XY_pre) != "y"])

plot(y_caret, y_nnet, main = "Hard to spot, but y_caret <> y_nnet - which prm have I forgotten?")
hist(y_caret - y_nnet)

Thx & kind regards

r.user.05apr
  • 5,356
  • 3
  • 22
  • 39
  • 3
    The question about discrepancies between `caret` and baseline packages comes up quite often. Many times, the discrepancy can be traced to random number generation. Neural net training typically starts from a random state. Since I don't see `set.seed` anywhere in your code, it's reasonable to expect that `caret::train` and `nnet::nnet` start from two different states. Consequently, they likely converge to two different local optima. – Artem Sokolov Jan 24 '18 at 15:07
  • set.seed is in line 4. Sometimes the differences are rather large (couldn't construct a better example though, because a haven't found the cause). – r.user.05apr Jan 24 '18 at 15:10
  • Ah, my apologies. I missed it on line 4. I think you want the random seed to be the same right before `nnet::net` and right before `caret::train` makes a call that uses the same set of parameters. The way the code is written right now, a different number of random numbers was generated between `set.seed` and the two training calls, which likely results in different states. – Artem Sokolov Jan 24 '18 at 15:13
  • 1
    I agree with @Artem Sokolov just running the `nnet` model with different seeds results in variation that is comparable to carets model. Just setting the same seed prior to both models will not accomplish much since there are different operations involved in caret and nnet. – missuse Jan 24 '18 at 15:14
  • 1
    Try the following experiment: (i) set the meta-parameter tuning matrix to contain a single set of parameters only; (ii) set the same random seed right before `caret::train` and right before `nnet::nnet`. Since `caret` will only consider a single set of meta-parameters, it should produce a single model that matches your own `nnet` call. – Artem Sokolov Jan 24 '18 at 15:18
  • 1
    @Artem Sokolov This indeed provides the same RMSE. – missuse Jan 24 '18 at 15:22
  • Great! You should post the results of your experiment and the conclusions as an answer, @missuse Will be happy to upvote! – Artem Sokolov Jan 24 '18 at 15:27
  • Could you post how you set the your tuning matrix? – r.user.05apr Jan 24 '18 at 15:31

1 Answers1

3

As stated in the comments the discrepancy is caused by different seeds. To quote @Artem Sokolov: Neural net training typically starts from a random state. It's reasonable to expect that caret::train and nnet::nnet start from two different states. Consequently, they likely converge to two different local optima.

To get a reproducible model start from the same seed:

controlList <- trainControl(method = "none", seeds = 1)
tuneMatrix <- expand.grid(size = 2, decay = 0)

set.seed(1)
caret_net <- train(x = XY_pre[ , colnames(XY_pre) != "y"],
                   y = XY_pre[ , colnames(XY_pre) == "y"],
                   method = "nnet",
                   linout = TRUE,
                   TRACE = FALSE,
                   maxit = 100,
                   tuneGrid = tuneMatrix,
                   trControl = controlList)

set.seed(1)
nnet_net <- nnet(x = XY_pre[ , colnames(XY_pre) != "y"],
                 y = XY_pre[ , colnames(XY_pre) == "y"],
                 linout = caret_net$finalModel$param$linout,
                 TRACE = caret_net$finalModel$param$TRACE,
                 size = caret_net$bestTune$size,
                 decay = caret_net$bestTune$decay,
                 entropy = caret_net$finalModel$entropy,
                 maxit = 100)

y_caret <- predict(caret_net, XY_pre[ , colnames(XY_pre) != "y"])
y_nnet <- predict(nnet_net, XY_pre[ , colnames(XY_pre) != "y"])


all.equal(as.vector(y_caret[,1]), y_nnet[,1])
#TRUE

apart from setting the same seeds the key is to avoid re-sampling in caret since it depends on the seed and precedes the model training.

missuse
  • 19,056
  • 3
  • 25
  • 47