2

I am currently trying to train a model, and my input pipeline is constructed as of this answer here. I want to save my model after each epochs. But after training for some epochs the training crash. I have read that it is because it adds the input as a constant tensor to the graph. There are suggested solutions here to use a tf.placeholder to solve the problem. Unfortunately it doesn't solve the problem for me. The input pipeline looks as follows

....
filenames = [P_1]
dataset = tf.data.TFRecordDataset(filenames)
def _parse_function(example_proto):
       keys_to_features = { 'data':tf.VarLenFeature(tf.float32)},
       parsed_features = tf.parse_single_example(example_proto,  keys_to_features)
       return tf.sparse_tensor_to_dense(parsed_features['data'
# Parse the record into tensors.
dataset = dataset.map(_parse_function)
# Shuffle the dataset
dataset = dataset.shuffle(buffer_size=1000)
# Repeat the input indefinitly 
dataset = dataset.repeat()      
# Generate batches     
dataset = dataset.batch(Batch_size) 
# Create a one-shot iterator
iterator = dataset.make_one_shot_iterator()
data = iterator.get_next()   
....
for i in range(epochs):
    for ii in range(iteration):
        image = sess.run(data)
        ....
     saver.save(sess, 'filename')

The error message looks as follows

[libprotobuf FATAL external/protobuf_archive/src/google/protobuf/message_lite.cc:68] CHECK failed: (byte_size_before_serialization) == (byte_size_after_serialization): tensorflow.GraphDef was modified concurrently during serialization.
terminate called after throwing an instance of 'google::protobuf::FatalException'  
what():  CHECK failed: (byte_size_before_serialization) == (byte_size_after_serialization): tensorflow.GraphDef was modified concurrently during serialization.
Aborted
D_negn
  • 378
  • 4
  • 13
  • Do you have more to the model (i.e. do the `....`s contain code)? This is a horribly worded error saying that the check point you want to save is too large of a file, so I don't think that the dataset code is your issue. – McAngus Oct 09 '18 at 15:05
  • The model is an auto encoder, which contains de/convolution layers in the encoder and decoder. when I save the images in csv file and read them directly this was not a problem, that is why I thought this is caused by the dataset. – D_negn Oct 10 '18 at 14:19

1 Answers1

1

The problem looks like it is in the _parse_function. Make sure the parser is doing in the same way when you create the TFrecord file. For example if they have the same data type or so

Habeshaw
  • 26
  • 3