I am new to computer vision but I am trying to code an android/ios app which does the following:
Get the live camera preview and try to detect one flat image (logo or painting) in that. In real-time. Draw a rect around the logo if found. If there is no match, dont draw the rectangle.
I found the Tensorflow Object Detection API as a good starting point. And support was just announced for importing TensorFlow models into Core ML.
I followed a lot of tutorials to train my own object detector. The training data is the key. I found a pretty good library to generate augmented image. I have created hundreds of variation of my image source (rotation, skew etc ...). But it has failed! This dataset is probably good for image classification (with my image in full screen) but not in context (the room).
I think transfer-learning is the key, In my case, I used the ssd_mobilenet_v1_coco model as a base. I tried to fake the context of my augmented image with the Random Erasing Data Augmentation technique without success.
What are my available solutions? Do I tackle the problem rightly? I need to make the model training as fast as possible.
May I have to use some datasets for indoor-outdoor image classification and put my image randomly above? How important are the perspectives?
Thank you!