Cityscapes Trafficsigns no box- or mask- detection with TF Object Detection API

Question

I'd be thankful for all thoughts, tips or links on this:

Using TF 1.10 and the recent object detection-API (github, 2018-08-18) I can do box- and mask prediction using the PETS dataset as well as using my own proof of concept data-set:

But when training on the cityscapes traffic signs (single class) I am having troubles to achieve any results. I have adapted the anchors to respect the much smaller objects and it seems the RPN is doing something useful at least:

Anyway, the box predictor is not going into action at all. That means I am not getting any boxes at all - not to ask for masks.

My pipelines are mostly or even exactly like the sample configs. So I'd expect either problems with the specific type of data or a bug .

Would you have any tips/links how to (either)

visualize the RPN results when using 2 or 3 stages? (Using only one stage does that, but how would one force that?)
train the RPN first and continue with boxes later?
investigate where/why the boxes get lost? (having predictions with zero scores while evaluation yields zero classification error)

score 0 · Accepted Answer · answered Oct 04 '18 at 14:43

The Solution finally was a combination of multiple issues:

The parameter from_detection_checkpoint: true is depreciated and to be replaced by fine_tune_checkpoint_type: 'detection'. However, without any of those the framework seems to default to 'classification', what seems to break the whole idea of the object detection framework. No good idea to rely on the defaults this time.
My data wasn't prepared good enough. I had boxes with zero width+/height (for whatever reason). I also removed masks for instances that were disconnected.
Using the keep_aspect_ratio_resizer together with random_crop_image and random_coef: 0.0 does not seem to allow for the full resolution as the resizer seems to be applied before the random cropping. I do now split my input images into (vertical) stripes [for memory saving] and apply the random_crop with a small min_area so it does not skip the small features at all. Also I can now allow for a max_area: 1 and a random coefficient > 0, as the memory usage is dealt with.
One potential problem also arose from the fact that I only considered a single class (so far). This might be a problem either for the framework, or for the activation function in the network. However, in combination with the other issues this change seemed to cause no additional problems - at minimum.
Last but not least I updated the sources to 2018-10-02 but didn't walk through all modifications in detail.

I hope others can save time and troubles from my findings.

Cityscapes Trafficsigns no box- or mask- detection with TF Object Detection API

1 Answers1