7

I am working on a comparison of the fitting accuracy results for the different types of data quality. A "good data" is the data without any NA in the feature values. A "bad data" is the data with NA in the feature values. A "bad data" should be fixed by some value correction. As a value correction, it might be replacing NA with zero or mean value.

In my code, I am trying to perform multiple fitting procedures.

Review the simplified code:

from keras import backend as K
...

xTrainGood = ... # the good version of the xTrain data 

xTrainBad = ... #  the bad version of the xTrain data

...

model = Sequential()

model.add(...)

...

historyGood = model.fit(..., xTrainGood, ...) # fitting the model with 
                                              # the original data without
                                              # NA, zeroes, or the feature mean values

Review the fitting accuracy plot, based on historyGood data:

enter image description here

After that, the code resets a stored the model and re-train the model with the "bad" data:

K.clear_session()

historyBad = model.fit(..., xTrainBad, ...)

Review the fitting process results, based on historyBad data:

enter image description here

As one can notice, the initial accuracy > 0.7, which means the model "remembers" previous fitting.

For the comparison, this is the standalone fitting results of "bad" data:

enter image description here

How to reset the model to the "initial" state?

Ruben Kazumov
  • 3,803
  • 2
  • 26
  • 39

4 Answers4

8

K.clear_session() isn't enough to reset states and ensure reproducibility. You'll also need to:

  • Set (& reset) random seeds
  • Reset TensorFlow default graph
  • Delete previous model

Code accomplishing each below.

reset_seeds()
model = make_model() # example function to instantiate model
model.fit(x_good, y_good)

del model
K.clear_session()
tf.compat.v1.reset_default_graph()

reset_seeds()
model = make_model()
model.fit(x_bad, y_bad)

Note that if other variables reference the model, you should del them also - e.g. model = make_model(); model2 = model --> del model, model2 - else they may persist. Lastly, tf random seeds aren't as easily reset as random's or numpy's, and require the graph to be cleared beforehand.


Function/modules used:
import tensorflow as tf
import numpy as np
import random
import keras.backend as K

def reset_seeds():
    np.random.seed(1)
    random.seed(2)
    if tf.__version__[0] == '2':
        tf.random.set_seed(3)
    else:
        tf.set_random_seed(3)
    print("RANDOM SEEDS RESET")
OverLordGoldDragon
  • 1
  • 9
  • 53
  • 101
2

I had a similar issue when did training many models in a loop in a single file. I tried many things on Keras/TF (version 2.5), including the answers in this thread. Nothing helped apart from one thing - running one file from another file using subprocess module, which ensures the kernel restart every single time.

In the simplest way, you can keep a training code in a single file, and access it to run your initial model or rerun the consequent model from a different file. To run one file from another simply do it in the second file:

run_no = [0,1,2,3]
for i in range(len()):
    subprocess.run(["ipython", "your_main_file.ipynb", str(i)])    # for jupyter
    #subprocess.run(["python3", "your_main_file.py", str(i)])      # for python
Ramm
  • 31
  • 2
0

You are using K.clear_session() in the wrong way, to get a model with randomly initialized weights, you should delete the old model (using the del keyword), and then proceed to create a new model, and train it.

The you can use K.clear_session() after each fitting procedure.

Dr. Snoopy
  • 55,122
  • 7
  • 121
  • 140
0

Instantiating a new model object with the same name is not enough?

model = make_model()
Antonio
  • 1
  • 2