Training crash due to saving model: "tensorflow.GraphDef was modified concurrently during serialization"

Question

I am currently trying to train a model, and my input pipeline is constructed as of this answer here. I want to save my model after each epochs. But after training for some epochs the training crash. I have read that it is because it adds the input as a constant tensor to the graph. There are suggested solutions here to use a tf.placeholder to solve the problem. Unfortunately it doesn't solve the problem for me. The input pipeline looks as follows

....
filenames = [P_1]
dataset = tf.data.TFRecordDataset(filenames)
def _parse_function(example_proto):
       keys_to_features = { 'data':tf.VarLenFeature(tf.float32)},
       parsed_features = tf.parse_single_example(example_proto,  keys_to_features)
       return tf.sparse_tensor_to_dense(parsed_features['data'
# Parse the record into tensors.
dataset = dataset.map(_parse_function)
# Shuffle the dataset
dataset = dataset.shuffle(buffer_size=1000)
# Repeat the input indefinitly 
dataset = dataset.repeat()      
# Generate batches     
dataset = dataset.batch(Batch_size) 
# Create a one-shot iterator
iterator = dataset.make_one_shot_iterator()
data = iterator.get_next()   
....
for i in range(epochs):
    for ii in range(iteration):
        image = sess.run(data)
        ....
     saver.save(sess, 'filename')

The error message looks as follows

[libprotobuf FATAL external/protobuf_archive/src/google/protobuf/message_lite.cc:68] CHECK failed: (byte_size_before_serialization) == (byte_size_after_serialization): tensorflow.GraphDef was modified concurrently during serialization.
terminate called after throwing an instance of 'google::protobuf::FatalException'  
what():  CHECK failed: (byte_size_before_serialization) == (byte_size_after_serialization): tensorflow.GraphDef was modified concurrently during serialization.
Aborted

Do you have more to the model (i.e. do the `....`s contain code)? This is a horribly worded error saying that the check point you want to save is too large of a file, so I don't think that the dataset code is your issue. — McAngus, Oct 09 '18 at 15:05
The model is an auto encoder, which contains de/convolution layers in the encoder and decoder. when I save the images in csv file and read them directly this was not a problem, that is why I thought this is caused by the dataset. — D_negn, Oct 10 '18 at 14:19

score 1 · Accepted Answer · answered Apr 05 '19 at 13:15

1

The problem looks like it is in the _parse_function. Make sure the parser is doing in the same way when you create the TFrecord file. For example if they have the same data type or so

answered Apr 05 '19 at 13:15

Habeshaw

26
3

Training crash due to saving model: "tensorflow.GraphDef was modified concurrently during serialization"

1 Answers1