Model training using kaggle kernel

Question

I have been using Kaggle to work on GPU. I have found that whenever I train my model again my accuracy on validation data changes. I don't get a consistent result. Is it because of the GPU I am accessing?

score 0 · Accepted Answer · answered Sep 14 '20 at 11:29

The initial weights for two networks are generally random and randomness causes variation. This means that training a second time will lead to a slightly different solution. To ensure exact reproducibility you need two things: i) your code to be deterministic, ii) the same seed for the random number generator (RNG). The random number generator is library-specific. For numpy and tensorflow you can set it like this at the start (!) of your program:

np.random.seed(1337)
tf.random.set_seed(1337)

This means that repeatedly training the network should give you the same result. To get a different sample, you would have to initialize the RNG differently.

The non-determistic aspect is important to consider for some Optimizers. As an example, Adam used to be just that for some previous versions in Tensorflow/CUDA, meaning that you were unable to reproduce the same execution no matter how hard you tried.

That being said, of course slight differences can also be caused if your code executes CUDNN or CUDA-optimized methods. Switching between versions will then also potentially cause minor differences.

Model training using kaggle kernel

1 Answers1