How to optimize re-trained ssd_mobilenet_v2_coco for tensorflowjs inference?

Question

I am trying to re-train a mobilenet_v2 model for custom object detection by loosely following this tutorial. My end goal is to have a web_model I can query that will provide the scores, classIds, and number of detections. The final exported inference model works in the python environment, but is currently throwing weird errors when converted to web.

It feels like there is a step missing somewhere in my pipeline to enable the inference graph to be converted to web. It seems to be an issue with the model_main.py setting is_training=True and ultimately mucking with the final inference model. I just can't seem to find any supporting documentation or tutorial on how to generate a non-training model from my trained model.

I've been using tensorflow-gpu 1.13.1 and model_main.py to retrain the current ssd_mobilenet_v2_coco model provided by object detection zoo. I've also tried using the legacy train.py and tensorflow 1.14.0.

When it comes time to convert it to tfjs I've used both tensorflowjs 1.2.2.1 and 0.8.6, both resulting in the same error when trying to run the final result on web.

I've also tried performing intermediate graph transforms on the frozen model before converting it using 0.8.6.

Training the model:

python model_main.py --model_dir=output --pipeline_config_path=training\ssd_mobilenet_v2_coco.config -num_train_steps=200000

Exporting inference graph:

python export_inference_graph.py --input_type=image_tensor --output_directory=output_inf --pipeline_config_path=training\ssd_mobilenet_v2_coco.config --trained_checkpoint_prefix=neg_32\model.ckpt-XXXX

Converting using tfjs 1.2.2.1:

tensorflowjs_converter --input_format=tf_saved_model --output_format=tfjs_graph_model --saved_model_tags=serve --signature_name=serving_default output_inf\saved_model output_inf\web_model

Testing model in browser:

import * as tf from '@tensorflow/tfjs';

class Detector {
    async init() {      
        try {
            this.model = await tf.loadGraphModel('/web_model/model.json');
        } catch (err) {
            console.log(err);
        }
    }

    async detect(frame) {
        const { model } = this;

        const INPUT_TENSOR='image_tensor';
        const OUTPUT_TENSOR='num_detections'
        const zeros = tf.zeros([1, 300, 300, 3]);

        console.log("executing model");
        output = await model.executeAsync({[INPUT_TENSOR]: zeros}, OUTPUT_TENSOR);
        console.log(output);
    }
}

export default Detector;

Intermediate transforms:

def get_graph_def_from_file(graph_filepath):
    with ops.Graph().as_default():
        with tf.gfile.GFile(graph_filepath, 'rb') as f:
            graph_def = tf.GraphDef()
            graph_def.ParseFromString(f.read())
            return graph_def


graph_def = get_graph_def_from_file(file_name)

input_node=['image_tensor']
output_node=['num_detections,detection_scores,detection_boxes,detection_classes']
transforms = [
 'remove_nodes(op=Identity, op=CheckNumerics)',
 'fold_constants(ignore_errors=true)',
 'fold_batch_norms',
 'fold_old_batch_norms(ignore_errors=true)',
 'merge_duplicate_nodes',
 'strip_unused_nodes'
]

transformed_graph_def = graph_util.remove_training_nodes(graph_def, protected_nodes=output_node)

transformed_graph_def = TransformGraph(
        graph_def,
        input_node,
        output_node,
        transforms)

tf.train.write_graph(transformed_graph_def,
                         logdir=model_dir,
                         as_text=False,
                         name=out_name)

I was hoping for the final web model to provide detection results from the test array. However, instead tensorflowjs is returning the following error when the javascript code is executed:

Uncaught (in promise) Error: Operands could not be broadcast together with shapes 1,150,150,32 and 0.
    at Ir (tfjs:2)
    at new bi (tfjs:2)
    at e.batchNormalization (tfjs:2)
    at kt.runKernel.$x (tfjs:2)
    at tfjs:2
    at t.scopedRun (tfjs:2)
    at t.runKernel (tfjs:2)
    at os (tfjs:2)
    at batchNorm (tfjs:2)
    at jv (tfjs:2)

Then attempting to apply the fold_old_batch_norms in the TransformGraph produces this error:

2019-07-07 22:16:11.717749: I tensorflow/tools/graph_transforms/transform_graph.cc:317] Applying fold_old_batch_norms
Traceback (most recent call last):
  File "xxx/optimize.py", line 154, in <module>
    optimize_graph(model_dir, output_frozen_fname, transforms, output_nodes, output_optimized_fname)
  File "xxx/optimize.py", line 135, in optimize_graph
    transforms)
  File "xxx\venv\lib\site-packages\tensorflow\tools\graph_transforms\__init__.py", line 51, in TransformGraph
    transforms_string, status)
  File "xxx\venv\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 528, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Beta input to batch norm has bad shape: [32]

Hi did you get it running? I am getting some output but could you please guide me on how to interpret the output? — Saurabh Chauhan, Jan 03 '20 at 08:58

How to optimize re-trained ssd_mobilenet_v2_coco for tensorflowjs inference?

0 Answers0