1

I am trying to get the locations of options in some images of questions using OpenCV matchTemplate. I tried using OCR with bounding boxes but it nearly takes 10 seconds to compute, so I decided to try matchTemplate. It was a lot faster but not very accurate. Here are my images and my code

const cv = require('@u4/opencv4nodejs');
const fs = require('fs');
const color = new cv.Vec3(255,255,255);


async function opencvGetPositions(imageData,path,answers){
const mat = cv.imdecode(imageData);
let modifiedMat = mat.cvtColor(cv.COLOR_RGB2GRAY);
modifiedMat = modifiedMat.threshold(0,255,cv.THRESH_OTSU);
modifiedMat = modifiedMat.bitwiseNot();

//answers is an array of cv.mat converted to grayscale

  for(let i = 0; i < answers.length;i++){
    const ww = answers[i].sizes[1];
    const hh = answers[i].sizes[0];
    const matched = modifiedMat.matchTemplate(answers[i],cv.TM_SQDIFF);
    const loc  = matched.minMaxLoc().minLoc;

    const xx = loc.x;
    const yy = loc.y;


    const pt1 = new cv.Point(xx,yy);
    const pt2 = new cv.Point(xx+ww, yy+hh);
    modifiedMat.drawRectangle(pt1,pt2,color,2);
  }

  cv.imwrite(path + '/output.png', modifiedMat);

}

I'm using nodejs and @u4/opencv4nodejs package

Answers array consists of these images:

enter image description here enter image description here enter image description here enter image description here enter image description here

Applied to:

enter image description here

This image is mostly accurate probably because I cropped the options from this one, I guess there are slight differences on each questions options so maybe that causes the inaccuracy.

enter image description here

But most of the images are very inaccurate like this one.

So is there a better way to do this or someway to make the matchTemplate function more accurate?

And here are the images without any modifications:

enter image description here

enter image description here

BleedFeed
  • 33
  • 4
  • 1
    You should not take all the local minima, pick only those that are below a certain threshold. That is, the value of the local minimum is how well the template matched, with smaller values meaning better match (because you compute the mean square error between the template and the image). The lower the threshold, the fewer false positives you get, but also the more false negatives you get. So you need to tweak it a bit to find a good compromise. – Cris Luengo Jul 08 '23 at 15:56
  • 2
    Question is not very clear and too complicated and the issue is not clear. Post one image and one template and your results with match scores. If the image and template are rotated relative to each other or have different scales, then template matching may not work. If the backgrounds are not the same, then you might need to use a mask image. – fmw42 Jul 08 '23 at 15:58
  • I changed matchTemplate mode to `TM_SQDIFF_NORMED` and printed out minVal on image https://imgur.com/a/xAi8oT4 this is what they look like now, it looks like it finds two options at the same position. Can you tell me how can i do the thresholding? – BleedFeed Jul 08 '23 at 18:32

1 Answers1

0

Fixed it by standardizing the option images to fit all the questions and calculating matches at different scales in a loop. I found a way to calculate more than one matchTemplate() at different scales then get the best one

let found = {};
for (const scale of (linspace(0.7, 1.1, 30))) {
  const resizedQuestion = whiteSpacedMat.resize(0, 0, scale, scale);
  const scaleRatio = whiteSpacedMat.sizes[0] / resizedQuestion.sizes[0];
  if (resizedQuestion.sizes[0] < grayAnswer.sizes[0] || resizedQuestion.sizes[1] < grayAnswer.sizes[1]) {
    break;
  }
  const edged = resizedQuestion.canny(50, 200);
  const result = edged.matchTemplate(answerEdges,cv.TM_SQDIFF_NORMED);
  const {minVal,maxVal,minLoc,maxLoc} = cv.minMaxLoc(result);
  if (typeof found.val == 'undefined' || found.val > minVal) {
    found = {val:minVal,loc:minLoc,scale:scaleRatio}
  }
}

And now I am using canny images instead of thresholded binary ones don't know if that contributed to accuracy.

After the loop it was a lot better but still had some issues like confusing option B with E and there was still some inaccuracy. So i figured out that if i somehow standardize the spaces around options it will be easier to find them. I first added 30 pixels to the left of the image because option A) didn't have much space on the left.

mat.copyMakeBorder(0,0,30,0,cv.BORDER_CONSTANT,new cv.Vec3(255,255,255));

Then I added some space between text lines using this. And increased the spaces around template images. Now it does not recognize some text as a option because it is looking for text areas that have a lot of space around them.

enter image description here

and this is what the results look like enter image description here enter image description here

Tyler2P
  • 2,324
  • 26
  • 22
  • 31
BleedFeed
  • 33
  • 4
  • Don’t use the normed error measure, it norms in part on the content of the image, which you don’t want. You can manually norm to the template, if you need to compare error measures for different templates. – Cris Luengo Jul 09 '23 at 17:53
  • do i have to normalize the output from `TM_SQDIFF` if i just compare them? I wanted to know the accuracy by normalizing it, and how can i manually normalize the values? – BleedFeed Jul 09 '23 at 18:03
  • Look at the equations used in the docs: https://docs.opencv.org/2.4/modules/imgproc/doc/object_detection.html?highlight=matchtemplate#matchtemplate — you want to normalize by the root of the sum of the squares of the template. That will make a match for one template comparable to a match for another one. – Cris Luengo Jul 09 '23 at 18:16