4

I'm trying to perform object detection with RCNN on my own dataset following the tutorial on Matlab webpage. Based on the picture below:

enter image description here

I'm supposed to put image paths in the first column and the bounding box of each object in the following columns. But in each of my images, there is more than one object of each kind. For example there are 20 vehicles in one image. How should I deal with that? Should I create a separate row for each instance of vehicle in an image?

Community
  • 1
  • 1
Hadi GhahremanNezhad
  • 2,377
  • 5
  • 29
  • 58

2 Answers2

3

The example found on the website finds the pixel neighbourhood with the largest score and draws a bounding box around that region in the image. When you have multiple objects now, that complicates things. There are two approaches that you can use to facilitate finding multiple objects.

  1. Find all bounding boxes with scores that surpass some global threshold.
  2. Find the bounding box with the largest score and find those bounding boxes that surpass a percentage of this threshold. This percentage is arbitrary but from experience and what I have seen in practice, people tend to choose between 80% to 95% of the largest score found in the image. This will of course give you false positives if you submit an image as the query with objects not trained to be detected by the classifier but you will have to implement some more post-processing logic on your end.

An alternative approach would be to choose some value k and you would display the top k bounding boxes associated with the k highest scores. This of course requires that you know what the value of k is before hand and it will always assume that you have found an object in the image like the second approach.


In addition to the above logic, the approach that you state where you need to create a separate row for each instance of vehicle in the image is correct. This means that if you have multiple candidates of an object in a single image, you would need to introduce one row per instance while keeping the image filename the same. Therefore, if you had for example 20 vehicles in one image, you would need to create 20 rows in your table where the filename is all the same and you would have a single bounding box specification for each distinct object in that image.

Once you have done this, assuming that you have already trained the R-CNN detector and you want to use it, the original code to detect objects is the following referencing the website:

% Read test image
testImage = imread('stopSignTest.jpg');

% Detect stop signs
[bboxes, score, label] = detect(rcnn, testImage, 'MiniBatchSize', 128)

% Display the detection results
[score, idx] = max(score);

bbox = bboxes(idx, :);
annotation = sprintf('%s: (Confidence = %f)', label(idx), score);

outputImage = insertObjectAnnotation(testImage, 'rectangle', bbox, annotation);

figure
imshow(outputImage)

This only works for one object which has the highest score. If you wanted to do this for multiple objects, you would use the score that is output from the detect method and find those locations that either accommodate situation 1 or situation 2.

If you had situation 1, you would modify it to look like the following.

% Read test image
testImage = imread('stopSignTest.jpg');

% Detect stop signs
[bboxes, score, label] = detect(rcnn, testImage, 'MiniBatchSize', 128)

% New - Find those bounding boxes that surpassed a threshold
T = 0.7; % Define threshold here
idx = score >= T;

% Retrieve those scores that surpassed the threshold
s = score(idx);

% Do the same for the labels as well
lbl = label(idx);

bbox = bboxes(idx, :); % This logic doesn't change

% New - Loop through each box and print out its confidence on the image
outputImage = testImage; % Make a copy of the test image to write to
for ii = 1 : size(bbox, 1)
    annotation = sprintf('%s: (Confidence = %f)', lbl(ii), s(ii)); % Change    
    outputImage = insertObjectAnnotation(outputImage, 'rectangle', bbox(ii,:), annotation); % New - Choose the right box
end

figure
imshow(outputImage)

Note that I've stored the original bounding boxes, labels and scores in their original variables while the subset of the ones that surpassed the threshold in separate variables in case you want to cross-reference between the two. If you wanted to accommodate for situation 2, the code remains the same as situation 1 with the exception of defining the threshold.

The code from:

% New - Find those bounding boxes that surpassed a threshold
T = 0.7; % Define threshold here
idx = scores >= T;
% [score, idx] = max(score);

... would now change to:

% New - Find those bounding boxes that surpassed a threshold
perc = 0.85; % 85% of the maximum threshold
T = perc * max(score); % Define threshold here
idx = score >= T;

The end result will be multiple bounding boxes of the detected objects in the image - one annotation per detected object.

rayryeng
  • 102,964
  • 22
  • 184
  • 193
  • 1
    Thank you for your complete response. It was very helpful. I just have a question about the performance. What happens if the threshold isn't the best for some images? Would that result in missing some of the objects and leave them undetected? – Hadi GhahremanNezhad Mar 21 '17 at 16:26
  • @HadiGhahremanNezhad That is correct. The threshold is purely experimental. You have to look at the kinds of images you have as well as the objects you have and run several trials with different thresholds. You would then choose the threshold that results in the highest precision and recall. – rayryeng Mar 21 '17 at 16:34
  • thank you the tips were really great. But this is still tricky as I have **11** types of objects and there could be multiple instances of each of them in a single image. So the creation of the data cell isn't easy and I'm confused how many rows should it have for each image. I have searched for a proper tool for object detection with deep learning and haven't found a simple one yet. I tried **Caffe**, **Nvidia digits** and now matlab, no luck yet! None of them has worked for me. – Hadi GhahremanNezhad Mar 21 '17 at 17:54
  • Caffe has worked for me in the past. At this point I would also recommend TensorFlow. It has a Faster R-CNN module that's already prebaked which you can use to help train for your data. – rayryeng Mar 21 '17 at 17:56
  • I am going to try your instructions on a single type object dataset. One more question. I did object classification with Caffe, but for object detection the datasets are different. They have images with multiple objects with assigned bounding boxes. How did you manage to create the **LMDB** format from this kind of datasets? – Hadi GhahremanNezhad Mar 21 '17 at 18:49
  • I primarily use the Python interface and I used this code written by Evan Shelhamer here: https://github.com/BVLC/caffe/issues/1698#issuecomment-70211045. Note that if you want to use floating-point data, you have to access the `float_data` parameter in the Datum object: https://github.com/BVLC/caffe/issues/4109. – rayryeng Mar 21 '17 at 18:52
2

I think you actually have to put all of the coordinates for that image as a single entry in your training data table. See this MATLAB tutorial for details. If you load the training data to your MATLAB locally and check the vehicleDataset variable, you will actually see this (sorry my score is not high enough to include images directly in my answers).

To summarize, in your training data table, make sure you have one unique entry for each image, and put however many bounding boxes into the corresponding category as a matrix, where each row is in the format of [x, y, width, height].

Yanfeng Liu
  • 549
  • 5
  • 10
  • based on the picture you put, it means, for example if I had 5 vehicles in a single image, I should put all their bounding box info in a single row and separate them with `;`. Is this right? – Hadi GhahremanNezhad Jul 17 '17 at 10:44
  • 1
    @HadiGhahremanNezhad Yes, but in MATLAB that `;` would be interpreted as a new row for the matrix. So you are actually creating a mutli-row matrix with each row being a bounding box. – Yanfeng Liu Jul 17 '17 at 13:46