I am modifying deeplab Network. I added a node to the mobilenet-v3 feature extractor's first layer, which reused the existing variables. As no extra parameters would be needed, I could theoretically load the old checkpoint.
Here comes the situation I couldn't understand:
when I start training in a new empty folder, load checkpoint like this:
python "${WORK_DIR}"/train.py \
#--didn't change other parameters \
--train_logdir="${EXP_DIR}/train" \
--fine_tune_batch_norm=true \
--tf_initial_checkpoint="init/deeplab/model.ckpt"
I get an Error:
ValueError: Total size of new array must be unchanged for MobilenetV3/Conv/BatchNorm/gamma lh_shape: [(16,)], rh_shape: [(480,)]
BUT, if I start training in a new empty folder, don't load any checkpoint:
python "${WORK_DIR}"/train.py \
#--didn't change other parameters \
--train_logdir="${EXP_DIR}/train" \
--fine_tune_batch_norm=false \
#--tf_initial_checkpoint="init/deeplab/model.ckpt" #i.e. no checkpoint
I could smoothly start the training.
Which made me more confusing is that, if in the same folder(which has been the train_logdir without checkpoint loaded), I try to start training with checkpoint, I could also start the training without error:
# same code as the first code block
python "${WORK_DIR}"/train.py \
#--didn't change other parameters \
--train_logdir="${EXP_DIR}/train" \
--fine_tune_batch_norm=true \
--tf_initial_checkpoint="init/deeplab/model.ckpt"
How could this happen? The --train_logdir could somehow store the shape of Batch Normalization parameters from last training?