Yolo object detection: include images that do not contain classes to be predicted?

Question

I want to train tiny yolo on my own dataset. I want to predict 3 classes: cars, pedestrians and cyclists; all of these have been annotated.

My dataset also includes images that do not contain these classes (hence no annotations). Should I include these images in the training? Why or why not?

Thank you!

Then what's the point of putting those unwanted and unlabeled images in your dataset? — gameon67, Mar 18 '19 at 00:25
I’m voting to close this question because it's about ML theory, not programming. — cigien, Apr 07 '21 at 19:55

j314erre · Answer 1 · 2021-06-25T00:56:34.507

TLDR; You don't need to provide images with no classes for YOLO.

YOLO divides the output layer into individual grid cells, and each grid cell has one or more anchor box priors per cell that each predict output values for object class labels vs "nonobjectness".

Since most images won't contain objects in each and every grid cell, it will naturally learn how to identify "no object".

In fact, there is usually an imbalance of too many anchors with no object and too few with an actual object. That is why YOLO uses a joint loss function that down weights the negative examples λ_noobj = .5

Other approaches, such as SSD use "hard negative mining" to reduce the number of negative examples and address the imbalance.

Therefore you do not usually need to include pure negative training examples, since there will already be an imbalance of too many negative grid boxes in your dataset of positive examples.

One exception I can think of is: if all your training examples contain many objects across the entire field of view (i.e. crowds, traffic jams, etc.) Then you might need to include some training examples without objects.

Another exception is if your objects always appear in the same grid cell (e.g. the center) then you might need some pure negative examples, or use data augmentation to generate examples with objects appearing in different places.

wouterio · Answer 2 · 2021-04-07T20:26:24.263

Besides providing training data what your model should detect, it may also be helpful to provide negative data what it should not detect. Images without annotations implicitly say that anything in there is not what the model should detect.

Let's say you're training a model to detect yellow cabs. Of course you provide data with yellow cabs. But it also makes sense to include negative data containing yellow objects that aren't cabs as well as cars that aren't cabs. This prevents your network from learning that anything yellow is a cab, or any car is a cab.

Neural networks are a bit of black box, but from a theoretical viewpoint you could say that they somehow extract certain abstract features from their input. Based on the extracted features they determine (for instance) an object's class and position.

Training a neural network then means that the network learns to find abstract features relevant for determining the class and position of objects. The nature of neural networks makes it hard to understand what features it is learning. All we can see is that the neural network starts to behave according how we train it.

Without negative data a network may learn too abstract features. Then the network may find those features in other objects it should not detect. For instance, in our team we were training a YOLO network to detect certain plants. But one time we found that one of our networks also detected plants in an image with nothing but blocks.

Negative data provides more feedback for learning features. During the training process it may happen that the network starts learning too abstract features. But then chances are that the network starts to detect objects in the negative data. The training algorithm then sees that the network falsely detects objects and provides feedback.

Rohit refers to AlexeyAB's github page, stating that you should provide as many images of negative samples as you provide images with objects. Since AlexeyAB is one of the main contributors to YOLO, it probably won't hurt to follow his advice, unless you have clear evidence he's wrong.

score 4 · Answer 3 · answered Feb 28 '21 at 17:25

It is recommended to have images without objects, but I am not exactly sure what the actual reason for it is.

desirable that your training dataset include images with non-labeled objects that you do not want to detect - negative samples without bounded box (empty .txt files) - use as many images of negative samples as there are images with objects

https://github.com/AlexeyAB/darknet#how-to-train-to-detect-your-custom-objects

Yolo object detection: include images that do not contain classes to be predicted?

3 Answers3