0

Currently I am working with a sudoku grid and I have the binary image. I am using Regionprops to get the area of the connected components and then turn the rest of the image black. After this I call the OCR method to try and read the sudoku numbers. The problem is that this only works if the sudoku grid in the image is straight and upright. If it is rotated even a little bit I am not able to pull the numbers. This is the code I have so far:

% get grid connected parts
conn_part = bwconncomp(im_binary);

% blacken area outside
stats = regionprops(conn_part,'Area'); 
im_out = im_binary;  % Make mask
im_out(vertcat(conn_part.PixelIdxList{[stats.Area] < 825 | [stats.Area] > 2500})) = 0; 

imagesc(im_out); 
title("Numbers pulled");
sudokuNum = ocr(im_out,'TextLayout','Block','CharacterSet','0123456789');
sudokuNum.Text;

Where im_binary is the binary image

im_out is the output image

stats is the object returned from regionprops containing the area of the connected components

I know I can rotate the image before getting the OCR results by doing:

im_out = imrotate(im_out, angle)

However I don't know what angle the grid is at since this is part of a function that loops through for multiple images. I looked into the regionprops method because there is an attribute 'Orientation' which I can pull from there but I don't understand how I would actually use it. It also states that regionprops will return a value between -90 and 90, but my image could be rotated by more than 90 degrees.

tushariyer
  • 906
  • 1
  • 10
  • 20

1 Answers1

1

Don't rotate the connected component or the binary image. First use the binary image to determine the rotation, then rotate the original grey-scale or color input image, and then binarize the rotated image. You'll be able to transform with interpolation, which will improve your results greatly. It does require to do the binarization step twice, but I don't think this step usually is too expensive.

The regionprops orientation feature is computed by "fitting" an ellipse to the shape. This is meaningful only for elongated objects. For a square sudoku grid this will not yield any valuable information.

Instead, look at the angle at which the smallest Feret diameter was obtained. The Feret diameters are the lengths of the projections at arbitrary angles. At one angle, this projection is smallest. By necessity it will be at an angle corresponding to one of the principal axes of the square. Here is more information about how to compute Feret diameters in MATLAB.

A different alternative is e.g. to use the Hough transform to detect the lines of the grid.

Do note that the geometry of the puzzle will never tell you about which side is up. The angle you get here should be taken modulo π/2 (i.e. constrain to the range -π/4 to π/4).

To know what direction is up you might do by trying to read the text, if it fails, rotate by 90 degrees and try again.

Cris Luengo
  • 55,762
  • 10
  • 62
  • 120
  • Alright, but how would I know how much to rotate each image by? When it is a grey scale image, I don't know the orientation of the grid, since this function is going to be used on a variety of images and takes no user input. – tushariyer Apr 17 '18 at 13:42
  • No, you determine the orientation based on the detected grid (after binarization and whatever else processing), then you rotate the input image and binarize again. – Cris Luengo Apr 17 '18 at 15:21