Trying to retrain a tensorflow model, input and output nodes disappear

Question

I am trying to retrain the tensorflow deeplab model using MobileNet_V2. I have downloaded the checkpoint from the deeplab model zoo, about halfway down this page: https://github.com/tensorflow/models/blob/master/research/deeplab/g3doc/model_zoo.md Specifically, the mobilenetv2_coco_voc_trainaug one. I would like my retrained output to have the same graph, but different parameters as this one. (Well, almost the same graph, the final tensor should probably have a different shape because I am trying to work with a different number of classes.)

I assembled my own images into a tfrecord, labelled with just one class for now. This is practice for a dataset with 4 classes.

I then ran the following to retrain the network, producing .pbtxt, .meta, .index and .data-00000-of-00001 files:

PATH_TO_INITIAL_CHECKPOINT=/path/to/unzipped/files/model.ckpt-30000.index
PATH_TO_TRAIN_DIR=/path/to/checkpoints/
PATH_TO_DATASET=/path/to/tfrecord
python /path/to/tensorflow/models/research/deeplab/train.py \
    --logtostderr \
    --training_number_of_steps=900 \ # 90000 \
    --train_split="train" \
    --model_variant="mobilenet_v2" \
    --output_stride=16 \
    --decoder_output_stride=4 \
    --train_crop_size=128 \
    --train_crop_size=128 \
    --train_batch_size=1 \
    --dataset="cityscapes" \
    --tf_initial_checkpoint=${PATH_TO_INITIAL_CHECKPOINT} \
    --train_logdir=${PATH_TO_TRAIN_DIR} \
    --dataset_dir=${PATH_TO_DATASET} \
    --initialize_last_layer=False \
    --last_layers_contain_logits_only=True \
    --fine_tune_batch_norm=False

Running bazel's summarize_graph on the downloaded file gives:

Found 1 possible inputs: (name=ImageTensor, type=uint8(4), shape=[1,?,?,3]) 
No variables spotted.
Found 1 possible outputs: (name=SemanticPredictions, op=Slice)

When I scan the nodes of the .pbtxt file, I can't find any nodes called ImageTensor or SemanticPredictions. I have tried with tensorboard, bazel's summarize_graph, and programmatically (e.g. here, here, or here). Summarize_graph says No inputs spotted and Found 664 possible outputs:.

This then leads to problems with freeze_graph.py. If I choose output_node_names from what I can see on tensorbord, then freeze_graph.py runs, and I am able to get a frozen graph. But running that model gives me

TypeError: Cannot interpret feed_dict key as Tensor: The name 
'ImageTensor:0' refers to a Tensor which does not exist. The operation, 
'ImageTensor', does not exist in the graph.

I'm definitely doing something wrong here. The question is: what? I suspect it could be the arguments I supply to train.py, but really, that's just a shot in the dark. It could be that this is not how train.py is intended to be used, or deeplab's train.py is not compatible with MobileNetV2.

Edit: After a closer look at the options available in train.py, I have updated my command. Cleaning previous failed models from the TRAIN_DIR was also helpful to avoid the error:

Restoring from checkpoint failed. This is most likely due to a mismatch 
between the current graph and the graph from the checkpoint. Please ensure 
that you have not altered the graph expected based on the checkpoint.

I did put a bounty on this question, but it has expired. If anyone manages to come up with an answer, I'd be happy to put the bounty up again. (Related question: how come the bounty isn't returned to me when it expires? I guess you wouldn't want people to offer a bounty and then keep it for themselves, but since there were no answers, that doesn't really apply here.) — craq, Sep 20 '18 at 03:40
oh. "All bounties are paid for up front and non-refundable under any circumstances." Apparently it's intended to be more comparable to advertising than a reward. https://stackoverflow.com/help/bounty — craq, Sep 20 '18 at 03:53

Trying to retrain a tensorflow model, input and output nodes disappear

0 Answers0