I'm trying to train deeplabv3 model with mobilenetv3_small_seg architecture. I trained the model but the predictions I get is a complete blank mask with no class predictions. Steps I followed for training are:
Cloned official repository in Google Colab.
I prepared dataset with only one class (segmenting lips in a face). I followed Pascal VOC12 dataset format. I created RGB masks (0, 255, 0) with white boundaries around it (255, 255, 255) and black background (0, 0, 0) as shown below.
I then converted the RGB mask into single channel png (8 bit) with background:0, forground:1 and boundaries:255 with the help of this script as shown below:
I then successfully converted the dataset into tfrecord by modifying this script.
Then I added my dataset description in the data_generator.py with ignore_label=255 and num_classes=2.
Finally I started training with the following command:
!python train.py \ --logtostderr \ --training_number_of_steps=10000 \ --train_split="val" \ --model_variant="mobilenet_v3_small_seg" \ --decoder_output_stride=16 \ --train_crop_size="256,256" \ --train_batch_size=16 \ --dataset="pqr" \ --save_interval_secs=600 \ --save_summaries_secs=300 \ --save_summaries_images=True \ --log_steps=200 \ --train_logdir=${PATH_TO_TRAIN_DIR} \ --dataset_dir=${PATH_TO_DATASET}
After when training is complete, I tested the model with several different images. The output of the model is (256, 256) array with all values equals to 0. Not a single values I get 1 or anything else.
I'm new to machine learning. I want to know that
- what's wrong with my process? I watched many tutorials but I couldn't find the answer.
- Is there anything wrong with my dataset? The dataset contained total of 2000 images.
- I couldn't find the pretrained weights for mobilenetv3_small. If anybody knows, kindly share it so I can do the transfer learning.
- I set the number of classes to 2 (background and foreground). Is that right?