0

I am currently trying to train classification networks using TensorFlow API (https://github.com/tensorflow/models). After creating TFrecords for my data set (stored in research/slim/data), I train the networks using following command:

python research/slim/train_image_classifier.py \
--train_dir=research/slim/training/current_model \
--dataset_name=my_dataset \
--dataset_split_name=train \
--dataset_dir=research/slim/data \
--model_name=vgg_16 \
--checkpoint_path=research/slim/training/vgg_16_2016_08_28/vgg_16.ckpt \
--checkpoint_exclude_scopes=vgg_16/fc7,vgg_16/fc8 \
--trainable_scopes=vgg_16/fc7,vgg_16/fc8 \
--batch_size=5 \
--log_every_n_steps=10 \
--max_number_of_steps=1000 \

This works well for several classification networks (Inception, ResNet, MobileNet), but not so good for VGG-Net. I fine-tune following model of VGG-Net 16: http://download.tensorflow.org/models/vgg_16_2016_08_28.tar.gz

In general, it works to train this model, but when I train the network, the loss increases and not decreases. Maybe, it is due to my choice of 'checkpoint_exclude_scopes'.

Is it correct, to use the last fully-connected layer as checkpoint_exclude_scopes?

The same question occurs by freezing the graph, for the parameter 'output_node_names'. For InceptionV3, e.g., it works with 'output_node_names=InceptionV3/Predictions/Reshape_1'. But how to set this parameter for VGG-Net. I tried the following:

python research/slim/freeze_graph.py
--input_graph=research/slim/training/current_model/graph.pb
--input_checkpoint=research/slim/training/current_model/model.ckpt
--input_binary=true 
--output_graph=research/slim/training/current_model/frozen_inference_graph.pb 
--output_node_names=vgg_16/fc8

I didn't find any layer containing 'Predictions' or 'Logits' in the VGG-Net model, so I am not sure.

Thank you for helping!

golden96371
  • 350
  • 6
  • 19
  • 1
    did it work for MobileNet, if yes, what values did you pass in trainable_scopes, checkpoint_exclude_scopes and which checkpoint file did you use in checkpoint_path (i.e) checkpoint file of new dataset or default checkpoint file of Mobilenet? Can you pls guide through that – Dinesh Apr 04 '19 at 13:56
  • 1
    Why don't you give the scripts for the model you have issues(VGG16), instead of InceptionV3? – Anju Paul - Intel Apr 10 '19 at 05:43
  • @Anju Paul - Intel: I just updated the post by giving exactly the script commands which I used for VGG16. – golden96371 Apr 16 '19 at 09:19
  • @Dinesh: Yes, it works for MobileNet. Here the parameters which I used for MobileNet v1: --trainable_scopes=MobilenetV1/Logits --checkpoint_exclude_scopes=MobilenetV1/Logits --checkpoint_path=mobilenet_v1_1.0_224/mobilenet_v1_1.0_224.ckpt ___And for freezing the graph, I used --output_node_names=MobilenetV1/Predictions/Reshape_1 – golden96371 Apr 16 '19 at 09:24

1 Answers1

1

I tried to run train_image_classifier.py as in your script with a few changes as mentioned below:

  1. Changed train_dir, dataset_dir and checkpoint_path to my local path
  2. Since I ran on CPU, added --clone_on_cpu=True parameter to the command
  3. Removed the parameter dataset_name=my_dataset since it was throwing error for me

It ran fine. The loss started as high as 448 and then slowly it reduced and by the end of 1000th step it reduced to 3.5. It did fluctuate considerably, but the trend of loss was downward. Not sure why you were not able to see the same while trying to run.

Regarding your question on checkpoint_exclude_scopes while training and output_node_names while freezing graph, I think your choice of layers is absolutely fine. However, I would have preferred to train only the last fully connected layer(fc8) for faster convergence.

  • Thank you for your answer and for your help. In principle, you did the same as I did, only with another data set. It is nice to now, that it works for you.with these parameters. Then I will look further on the data set - maybe my current data set is to small for VGG-Net. – golden96371 Apr 23 '19 at 08:05