Keras LSTM - why different results with "same" model & same weights?

Question

(NOTE: Properly fixing the RNG state before each model creating as described in comment in comment practically fixed my problem, as within 3 decimals results are consistent, but they aren't exactly so, so there's somewhere a hidden source of randomness not fixed by seeding the RNG... probably some lib uses time milisecs or smth...if anyone has an idea on that, it would be cool to know, so I will wait and not close question yet :) )

I create a Keras LSTM model (used to predict some time series data, not important what), and every time I try to re-create an identical model (same mode config loaded from json, same weights loaded from file, same args to compile function), I get wildly different results on same train and test data. WHY?

Code is roughly like this:

# fix random
import random
random.seed(42)

# make model & compile
model = Sequential([
    LSTM(50, input_shape=(None, 1), return_sequences=True),
    LSTM(100, return_sequences=False),
    Dense(1),
    Activation("linear")
])
model.compile(loss="mse", optimizer="rmsprop")

# save it and its initial random weights
model_json = model.to_json()
model.save_weights("model.h5")

# fit and predict
model.fit(x_train, y_train, epochs=3)
r = model.predict(x_test)

# create new "identical" model
model2 = model_from_json(model_json)
model2.load_weights("model.h5")
model2.compile(loss="mse", optimizer="rmsprop")

# fit and predict "identical" model
model2.fit(x_train, y_train, epochs=3)
r2 = model2.predict(x_test)

# ...different results :(

I know that the model has initial random weights, so I'm saving them up and reloading them. I'm also paranoid enough to assume there are some "hidden" params that I may not know of, so I serialize model to json and reload instead of recreating an identical one by hand (tried that, same thing btw). And I also fixed the random number generator.

It's my first time with Keras, and I'm also a beginners to neural networks in general. But this this drives me crazy... wtf can vary?!

On fixing random number generators: I run Keras with the TensorFlow backend, and I have these lines of code at the start to try and fix the RNGs for experimental purposes:

import random
random.seed(42)
import numpy
numpy.random.seed(42)
from tensorflow import set_random_seed
set_random_seed(42)

...but they still don't fix the randomness.

And I understand that the goal is to make my model to behave non-randomly despite the inherent stochastic nature of NNs. But I need to temporarily fix this for experimental purposes (I'm even OK with it being reproducible on one machine only!).

I am not sure how this could affect the results, but you haven't "fixed" the random number generator for the second model. You'd need to start it again from the same state (seed=42), and you'd need to run exactly the same set of calls to the generator the second time. Furthermore, you don't know how Keras is getting its random numbers! It's likely, in fact, that it's not getting them from the `random` module. It might not even be getting them from `numpy` either, as the answer below assumes. — senderle, Sep 08 '17 at 15:14
You should specify the seed differently if you want to get consistent results. Depending on the keras backend (theano or tensorflow), there are two ways to specify the random seed. See here: https://stackoverflow.com/questions/45970112/keras-lstm100-and-lstmunits-100-produces-different-results/45970553#45970553 — Miriam Farber, Sep 08 '17 at 15:15
@senderle THIS. I didn't realize that *of course the RNG state change when it's run*, so I don't only need to fix it at the beginning, but also to refix it before making model2 ...guess it's "Friday fried brain" :) this *almost* fixed my problem, in the sense that there is still randomness, but wrt the first 3 decimals it's reproducible (I imagine some library dependency has it's own hidden randomish thinggy). This is good enough so I can distinguish "actual variance" (different pred on only slightly different training data) from "model randomness" and can start to work to fix the first! Thx! — NeuronQ, Sep 08 '17 at 16:24
@NeuronQ There are various seeds you can set, please look at the example below. — Milind Deore, Apr 10 '18 at 08:19
I have the same issue. Did you ever work out where the other randomness is coming from? — SomePhysicsStudent, May 15 '20 at 16:55
@SomePhysicsStudent not fully, after "also re-fix RNGs states between runs / model creations etc." the randomness got reduced to point of having 3-4 decimals non-random, which was good enough to push on with work and "ship" a version of that... didn't touch LSTMs and time-seriseries-deep-learning ever since... my advice: re-try on latest version of keras/tf, I'm 90% sure the bug was somewhere deep in some tf c++ code some code path just ignored the given random seed while others didn't, and it's not worth debugging that on >1yr old code... — NeuronQ, May 17 '20 at 07:51
Thanks for that! I found out that TensorFlow 2.2 ships with a os environment variable TF_DETERMINISTIC_OPS which if set to '1', will ensure that only deterministic GPU ops are used. Setting to 1 fixed most of my GPU non-determinism except for a few TensorFlow ops that I ended up leaving on the CPU. — SomePhysicsStudent, May 17 '20 at 10:22
@SomePhysicsStudent That looks like a very useful tip! I added this as a note, by editing main accepted answer, to be of use to anyone else stumbling upon this question but not paying much attention to comments. Good luck with your work! — NeuronQ, May 18 '20 at 09:00

score 11 · Accepted Answer · edited May 18 '20 at 08:58

11

Machine learning algorithms in general are non-deterministic. This means that every time you run them the outcome should vary. This has to do with the random initialization of the weights. If you want to make the results reproducible you have to eliminate the randomness from the table. A simple way to do this is to use a random seed.

import numpy as np
import tensorflow as tf

np.random.seed(1234)
tf.random.set_seed(1234)

# rest of your code

If you want the randomness factor but not so high variance in your output, I would suggest either lowering your learning rate or changing your optimizer (I would suggest an SGD optimizer with a relatively low learning rate). A cool overview of gradient descent optimization is available here!

A note on TensorFlow's random generators is that besides a global seed (i.e. tf.random.set_seed()), they also use an internal counter, so if you run

tf.random.set_seed(1234)
print(tf.random.uniform([1]).numpy())
print(tf.random.uniform([1]).numpy())

You'll get 0.5380393 and 0.3253647, respectively. However if you re-run that same snippet, you'll get the same two numbers again.

A detailed explanation of how random seeds work in TensorFlow can be found here.

For newer TF versions take care of this too: TensorFlow 2.2 ships with a os environment variable TF_DETERMINISTIC_OPS which if set to '1', will ensure that only deterministic GPU ops are used.

edited May 18 '20 at 08:58

NeuronQ

7,527
9
42
60

answered Sep 08 '17 at 15:12

Djib2011

6,874
5
36
41

3

This will work if keras used theano backend. For tensorflow backend you need to specify the seed with set_random_seed as well. See here: https://stackoverflow.com/questions/45970112/keras-lstm100-and-lstmunits-100-produces-different-results/45970553#45970553 – Miriam Farber Sep 08 '17 at 15:18
ok, so first thanks for reminding of `numpy.random.seed`, I've only set `random.seed`, but... it doesn't change anything :| Yeah, reducing overall variance in output is the actual goal, but *right now* I just want to *understand why tf this is happening?!* Like I fixed *everything*, I should get the same results on same data! After that yeah, there's a lot to optimize/tweak... but first things first: _sanity and reproductibility_ is what I want. – NeuronQ Sep 08 '17 at 15:20
1

@NeuronQ Is your backend theano? If yes, it seems to me that this should be related somehow to the way you are saving and loading your weight, or the way you serializing. I tried to run your model after adding np.random.seed(42) using the data x_train=np.reshape(np.arange(10),(10,1,1)) x_test=x_train[:] y_train=2*np.arange(10) and got exactly the same results each time I run it. This means that the model itself is reproducible, when we ignoring all the "loading the data\ weights" part. – Miriam Farber Sep 08 '17 at 15:31
@MiriamFarber tried that too, still random. I'll probably get on with trying to tweak the optimizer and other parameters to reduce variability... but I'm still *incredibly annoyed that somewhere I have an extra source of randomness that I don't know of!* ...I understand NNs are stochastic in nature, but I want to *know* what each source of randomness is, and to be able to fix it at least on one machine when running experiments. – NeuronQ Sep 08 '17 at 15:32
@MiriamFarber backend is TF – NeuronQ Sep 08 '17 at 15:32
1

@NeuronQ In such case, ydd the following 4 lines to the top of your code: from numpy.random import seed seed(1) from tensorflow import set_random_seed set_random_seed(2) – Miriam Farber Sep 08 '17 at 15:34
@MiriamFarber did that, also edited question to mention I did it. didn't work. might be a bug that it ignores the seed setting... – NeuronQ Sep 08 '17 at 15:37
@Djib2011 sorry to bother you, but after coming back to this problem, after solving the initial randomness issues, I came across a weird finding: lowering learning rate *helps a lot* in reducing variance... but not not with SGD! (other optimizers like rmsprop work great). do you happen to know *in what cases SGD performs exceptionally bad for an RNN?* thanks! – NeuronQ Oct 05 '17 at 15:01
I'm not sure. If I were to guess, it has something to do with the mini-batches used for calculating the derivative. With SGD you don't move towards the best direction for your entire input data, but just for a batch. It is supposed to take a lot more epochs to converge than normal GD but each epoch is calculated a lot faster. So finally SGD is faster than normal GD. I guess that this adds more variance to your system... I'd recommend this if you're interested in reading more on optimization algorithms: http://ruder.io/optimizing-gradient-descent/ – Djib2011 Oct 06 '17 at 13:14

score 5 · Answer 2 · answered Apr 10 '18 at 08:18

This code is for keras using tensorflow backend

This is because the weights are initialised using random numbers and hence you will get different results every time. This is expected behaviour. To have reproducible result you need to set the random seed as. Below example set operation-level and graph-level seeds for more information look here

import tensorflow as tf
import random as rn

os.environ['PYTHONHASHSEED'] = '0'

# Setting the seed for numpy-generated random numbers
np.random.seed(37)

# Setting the seed for python random numbers
rn.seed(1254)

# Setting the graph-level random seed.
tf.set_random_seed(89)

from keras import backend as K

session_conf = tf.ConfigProto(
      intra_op_parallelism_threads=1,
      inter_op_parallelism_threads=1)

#Force Tensorflow to use a single thread
sess = tf.Session(graph=tf.get_default_graph(), config=session_conf)

K.set_session(sess)

# Rest of the code follows from here on ...

I'm curious if you can add to why setting intra_op_paraellism IS necessary, I've not seen this elsewhere in recommendations for removing as much randomization as possible. Are there operations in TF that have been shown to produce random results based on thread-dependent behavior that can't be predicted, or is this just to make absolutely sure? Others have recommended removing GPU support (not an option for a lot of people, I think), but it sounds like from TF 2.2 forward the environment variable "TF_DETERMINISTIC_OPS" = '1' should handle that. — Brendano257, May 14 '21 at 18:10
AttributeError: module 'tensorflow' has no attribute 'set_random_seed' — keramat, Jun 23 '21 at 18:39

score 3 · Answer 3 · answered Jul 24 '21 at 06:56

3

I resolved this issue by adding os.environ['TF_DETERMINISTIC_OPS'] = '1'

Here an example:

import os
os.environ['TF_DETERMINISTIC_OPS'] = '1'
#rest of the code
# tf version 2.3.1

answered Jul 24 '21 at 06:56

Francesco Laiti

1,791
2
13
19

score 0 · Answer 4 · edited Apr 03 '23 at 05:19

0

None of the previous answers worked for me except these two lines:

tf.keras.utils.set_random_seed(1)

tf.config.experimental.enable_op_determinism()

edited Apr 03 '23 at 05:19

Eric Aya

69,473
35
181
253

answered Apr 02 '23 at 17:33

user21550791

1
1

Keras LSTM - why different results with "same" model & same weights?

4 Answers4