-1

Let say I have 3 images (an apple, an orange, a banana) and another 1000 arbitrary images. What I want to do is to see if those 1000 arbitrary images contain object(s) similar to the former 3 images, if yes, draw a bounding box to indicate those objects. However, none of these 1003 images or objects are labelled nor have any annotations.

I have do some research on the internet and try to find some deep learning object detection approach (e.g. Faster R-CNN, YOLOv3) but I couldn't think of how they can be related to my task.

I have also notice that there is a term called template matching, but it seems not much related to deep learning.

So my question is:

Is there any good approach or deep learning model that could meet my needs?

Will I be benefit from any pre-trained Faster R-CNN, YOLOv3 models? (e.g. If they are trained by cars, people, dogs, cats image set, will those meaningful features can also apply to new domain?)

ML85
  • 709
  • 7
  • 19
Trevor. W
  • 451
  • 1
  • 5
  • 13

1 Answers1

0

I want to do is to see if those 1000 arbitrary images contain object(s) similar to the former 3 image

What did you mean by "similar?"

If you meant "I want to see if the 1000 images contain objects from the target classes: orange, apple, and banana", then here's the answer:

  • If your models were pre-trained with your target classes (orange, apple, and banana), then you can use those pre-trained models to detect the objects in your 1003 images. You can just select orange, apple, and banana as the classes' names in the configuration.

  • If your pre-trained models weren't trained on your target classes and you only have your 1003 images, you will need to do what is called fine-tuning, which is training the last layer of the model. 1003 images might not be enough for training the model and you might need to perform data augmentation to expand your data. Also, consider making your classes balanced (meaning having the same number of objects per class).

For something close to "similarity score," you can consider the confidence score for class x, which is the likelihood the bounding box contains an object x. However, this confidence score mainly depends on "how well trained" the model is on class x. For example, different models may differ in their confidence scores for the same images. Also, the same model may have different confidence scores for the same object in different angles, lighting, and orientation. Thus, it might be a better idea for you to fine-tune the models anyway so that they can be more "robust" to any representations of your target classes.