-2

I am new to computer vision but I am trying to code an android/ios app which does the following:

Get the live camera preview and try to detect one flat image (logo or painting) in that. In real-time. Draw a rect around the logo if found. If there is no match, dont draw the rectangle.

I found the Tensorflow Object Detection API as a good starting point. And support was just announced for importing TensorFlow models into Core ML.

I followed a lot of tutorials to train my own object detector. The training data is the key. I found a pretty good library to generate augmented image. I have created hundreds of variation of my image source (rotation, skew etc ...). But it has failed! This dataset is probably good for image classification (with my image in full screen) but not in context (the room).

I think transfer-learning is the key, In my case, I used the ssd_mobilenet_v1_coco model as a base. I tried to fake the context of my augmented image with the Random Erasing Data Augmentation technique without success.

What are my available solutions? Do I tackle the problem rightly? I need to make the model training as fast as possible.

May I have to use some datasets for indoor-outdoor image classification and put my image randomly above? How important are the perspectives?

Thank you!

1 Answers1

0
I have created hundreds of variation of my image source (rotation, skew etc ...). But it has failed! 

So that mean your model did not converge or the final performance was bad? If your model did not converge then add more data. "Hundred of samples" is very few. So use more images and make more samples, and make your sample s dispersed as possible.

I think transfer-learning is the key, In my case, I used the ssd_mobilenet_v1_coco model as a base. I tried to fake the context of my augmented image with the Random Erasing Data Augmentation technique without success.

You mean fine-tuning. Did you reduced the label to 2 (your image and background) and did fine-tuning. If you didn't then you surely failed. Oh man, you should at least show me your model definition.

What are my available solutions? Do I tackle the problem rightly? I need to make the model training as fast as possible.

To make training converge faster, just add more GPUs and train on multiple GPUs. If you don't have money, rent some GPU cluster on Azure. Believe me, it is not that expensive.

Hope that help

Vu Gia Truong
  • 1,022
  • 6
  • 14