How to save/restore large model in tensorflow 2.0 w/ keras?

Question

I have a large custom model made with the new tensorflow 2.0 and mixing keras and tensorflow. I want to save it (architecture and weights). Exact command to reproduce:

import tensorflow as tf


OUTPUT_CHANNELS = 3

def downsample(filters, size, apply_batchnorm=True):
  initializer = tf.random_normal_initializer(0., 0.02)

  result = tf.keras.Sequential()
  result.add(
      tf.keras.layers.Conv2D(filters, size, strides=2, padding='same',
                             kernel_initializer=initializer, use_bias=False))

  if apply_batchnorm:
    result.add(tf.keras.layers.BatchNormalization())

  result.add(tf.keras.layers.LeakyReLU())

  return result

def upsample(filters, size, apply_dropout=False):
  initializer = tf.random_normal_initializer(0., 0.02)

  result = tf.keras.Sequential()
  result.add(
    tf.keras.layers.Conv2DTranspose(filters, size, strides=2,
                                    padding='same',
                                    kernel_initializer=initializer,
                                    use_bias=False))

  result.add(tf.keras.layers.BatchNormalization())

  if apply_dropout:
      result.add(tf.keras.layers.Dropout(0.5))

  result.add(tf.keras.layers.ReLU())

  return result


def Generator():
  down_stack = [
    downsample(64, 4, apply_batchnorm=False), # (bs, 128, 128, 64)
    downsample(128, 4), # (bs, 64, 64, 128)
    downsample(256, 4), # (bs, 32, 32, 256)
    downsample(512, 4), # (bs, 16, 16, 512)
    downsample(512, 4), # (bs, 8, 8, 512)
    downsample(512, 4), # (bs, 4, 4, 512)
    downsample(512, 4), # (bs, 2, 2, 512)
    downsample(512, 4), # (bs, 1, 1, 512)
  ]

  up_stack = [
    upsample(512, 4, apply_dropout=True), # (bs, 2, 2, 1024)
    upsample(512, 4, apply_dropout=True), # (bs, 4, 4, 1024)
    upsample(512, 4, apply_dropout=True), # (bs, 8, 8, 1024)
    upsample(512, 4), # (bs, 16, 16, 1024)
    upsample(256, 4), # (bs, 32, 32, 512)
    upsample(128, 4), # (bs, 64, 64, 256)
    upsample(64, 4), # (bs, 128, 128, 128)
  ]

  initializer = tf.random_normal_initializer(0., 0.02)
  last = tf.keras.layers.Conv2DTranspose(OUTPUT_CHANNELS, 4,
                                         strides=2,
                                         padding='same',
                                         kernel_initializer=initializer,
                                         activation='tanh') # (bs, 256, 256, 3)

  concat = tf.keras.layers.Concatenate()

  inputs = tf.keras.layers.Input(shape=[None,None,3])
  x = inputs

  # Downsampling through the model
  skips = []
  for down in down_stack:
    x = down(x)
    skips.append(x)

  skips = reversed(skips[:-1])

  # Upsampling and establishing the skip connections
  for up, skip in zip(up_stack, skips):
    x = up(x)
    x = concat([x, skip])

  x = last(x)

  return tf.keras.Model(inputs=inputs, outputs=x)

generator = Generator()
generator.summary()

generator.save('generator.h5')
generator_loaded = tf.keras.models.load_model('generator.h5')

I manage to save the model with:

generator.save('generator.h5')

But when I try to load it with:

generator_loaded = tf.keras.models.load_model('generator.h5')

It never ends (no error message). Maybe the model is too large? I tried to save as JSON with model.to_json() as well as the full API tf.keras.models.save_model(), but same problem, impossible to load it (or at least far too long).

Same problem on Windows/Linux and with/without GPU.

The save and restore work well with full Keras and simple model.

Edit

Saving weights and then loading them works well, but it's impossible to load the model structure.
I put the model I use to reproduce the bug, it comes from Pix2Pix example (https://www.tensorflow.org/alpha/tutorials/generative/pix2pix)
I also wrote an issue on tensorflow github : https://github.com/tensorflow/tensorflow/issues/28281

TensorFlow 2.0 is still currently an alpha release, it has bugs, you shouldn't be using it for normal development. Maybe report this bug and move to a stable TF version. — Dr. Snoopy, Apr 30 '19 at 11:27
Few minutes. Yes, I know it's just an an alpha release, but it may be a mistake on my side. — Ridane, Apr 30 '19 at 11:30

score 2 · Answer 1 · answered Oct 13 '19 at 12:21

As of tensorflow release 2.0.0 there is now a keras / tf agnostic way of saving models using tf.saved_model

        ....

        model.fit(images, labels , epochs=30, validation_data=(images_val, labels_val), verbose=1)

        tf.saved_model.save( model, "path/to/model_dir" )

You can then load with

        loaded_model = tf.saved_model.load("path/to/model_dir")

score 0 · Answer 2 · answered Apr 30 '19 at 10:16

0

Try instead to save the model as:

model.save('model_name.model')

Then Load it with:

model = tf.keras.models.load_model('model_name.model')

answered Apr 30 '19 at 10:16

Suleiman

316
1
4
15

Thanks for the answer. Unfortunately, same problem, impossible to load it (at least too long, I stopped it before). – Ridane Apr 30 '19 at 11:11
@Ridane Why not pickle it instead? – bg2094 Apr 30 '19 at 11:36

score 0 · Answer 3 · answered May 07 '19 at 08:12

I found a temporary solution. It seems that the issue occurs with the sequential API tf.keras.Sequential, by using the functional API, tf.keras.models.load_model manages to load the saved model. I hope they will fixed this issue in the final release, have a look to the issue I raised in github https://github.com/tensorflow/tensorflow/issues/28281.

Cheers,

score 0 · Answer 4 · answered May 17 '19 at 17:41

I managed to save and load custom models by implementing similar functions to the Sequential model in Keras.

The key functions are CustomModel.get_config() CustomModel.from_config(), which also should exist on any of your custom layers (similar to the functions below, but see keras layers if you want a better understanding):

# In the CustomModel class    
def get_config(self):
    layer_configs = []
    for layer in self.layers:
        layer_configs.append({
            'class_name': layer.__class__.__name__,
            'config': layer.get_config()
        })
    config = {
        'name': self.name,
        'layers': copy.deepcopy(layer_configs),
        "arg1": self.arg1,
        ...
    }
    if self._build_input_shape:
        config['build_input_shape'] = self._build_input_shape
    return config

@classmethod
def from_config(cls, config, custom_objects=None):
    from tensorflow.python.keras import layers as layer_module
    if custom_objects is None:
        custom_objects = {'CustomLayer1Class': CustomLayer1Class, ...}
    else:
        custom_objects = dict(custom_objects, **{'CustomLayer1Class': CustomLayer1Class, ...})

    if 'name' in config:
        name = config['name']
        build_input_shape = config.get('build_input_shape')
        layer_configs = config['layers']
    else:
        name = None
        build_input_shape = None
        layer_configs = config
    model = cls(name=name,
                arg1=config['arg1'],
                should_build_graph=False,
                ...)
    for layer_config in tqdm(layer_configs, 'Loading Layers'):
        layer = layer_module.deserialize(layer_config,
                                         custom_objects=custom_objects)
        model.add(layer) # This function looks at the name of the layers to place them in the right order
    if not model.inputs and build_input_shape:
        model.build(build_input_shape)
    if not model._is_graph_network:
        # Still needs to be built when passed input data.
        model.built = False
    return model

I also added a CustomModel.add() function that adds layers one by one from their config. Also a parameter should_build_graph=False that makes sure you do not build the graph in the __init__() when calling cls().

Then the CustomModel.save() function looks like this:

    def save(self, filepath, overwrite=True, include_optimizer=True, **kwargs):
        from tensorflow.python.keras.models import save_model  
        save_model(self, filepath, overwrite, include_optimizer)

After that you can save using:

model.save("model.h5")
new_model = keras.models.load_model('model.h5',
                                        custom_objects={
                                        'CustomModel': CustomModel,                                                     
                                        'CustomLayer1Class': CustomLayer1Class,
                                        ...
                                        })

But somehow this approach seem to be quite slow... This approach on the other hand is almost 30x faster. Not sure why:

    model.save_weights("weights.h5")
    config = model.get_config()
    reinitialized_model = CustomModel.from_config(config)
    reinitialized_model.load_weights("weights.h5")

I works, but it seems quite hacky. Maybe future versions of TF2 will make the process clearer.

score -1 · Answer 5 · answered Apr 30 '19 at 11:43

-1

One other method of saving a trained model is to use the pickle module in python.

import pickle
pickle.dump(model, open(filename, 'wb'))

In order to load the pickled model,

loaded_model = pickle.load(open(filename, 'rb'))

The extension of the pickle file is usually .sav

answered Apr 30 '19 at 11:43

bg2094

172
9

Does not work either : "TypeError: can't pickle _thread.RLock objects" – Ridane Apr 30 '19 at 12:12
There is a work around for that error in the following link. Why not give it a shot? https://stackoverflow.com/questions/44855603/typeerror-cant-pickle-thread-lock-objects-in-seq2seq – bg2094 May 02 '19 at 05:59
Also how big was the h5 file? Couple of Gigabytes at the least i suppose ? – bg2094 May 02 '19 at 06:04
The *.h5 file for this one from example is 212 722Ko. Yes I could give it a shot with the pickle module. But i'd rather use the tensorflow API in a clean way, and I don't think that the size is the issue here, it seems to be deeper. In my opinion, lot of people will use the keras.save API, therefore I wrote an issue on tensorflow github here: https://github.com/tensorflow/tensorflow/issues/28281 :) – Ridane May 02 '19 at 13:29

How to save/restore large model in tensorflow 2.0 w/ keras?

Edit

5 Answers5