3

I've just realized that if I perform OCR process only on the regions that contain text, it would be a lot faster. So what I did were detecting the text regions in the image and then perform OCR process on each one of them. This is the result of "detecting text regions" step using OpenCV (I used it to draw the rectangles on the image):

Text regions detecting

The only problem remains is I couldn't arrange the text result in the order that they appear on the original image. In this case, it should be:

circle oval triangle square trapezium
diamond rhombus parallelogram rectangle pentagon
hexagon heptagon octagon nonagon decagon

Some other cases:

Basically any other images that have text on them.

PaperIn-app purchase - Messy Mary:Finding objects

So I'm trying to sort the array of rectangles (origin point, width and height) then rearrange the text associate with them.

Further information

I don't know if it's necessary, but here is the code I used:

How I detected the text regions

+(NSMutableArray*) detectLetters:(UIImage*) image


{
    cv::Mat img;
    UIImageToMat(image, img);
    if (img.channels()!=1) {
        NSLog(@"NOT A GRAYSCALE IMAGE! CONVERTING TO GRAYSCALE.");
        cv::cvtColor(img, img, CV_BGR2GRAY);
    }
//The array of text regions (rectangle)
NSMutableArray* array = [[NSMutableArray alloc] init];

cv::Mat img_gray=img, img_sobel, img_threshold, element;

//Edge detection
cv::Sobel(img_gray, img_sobel, CV_8U, 1, 0, 3, 1, 0, cv::BORDER_DEFAULT);

cv::threshold(img_sobel, img_threshold, 0, 255, CV_THRESH_OTSU+CV_THRESH_BINARY);

element = getStructuringElement(cv::MORPH_RECT, cv::Size(17, 3) );

cv::morphologyEx(img_threshold, img_threshold, CV_MOP_CLOSE, element);

std::vector< std::vector< cv::Point> > contours;

//
cv::findContours(img_threshold, contours, 0, 1);

std::vector<std::vector<cv::Point> > contours_poly( contours.size() );


for( int i = 0; i < contours.size(); i++ )
    if (contours[i].size()>50)
    {
        cv::approxPolyDP( cv::Mat(contours[i]), contours_poly[i], 3, true );
        cv::Rect appRect( boundingRect( cv::Mat(contours_poly[i]) ));
        if (appRect.width>appRect.height){
                [array addObject:[NSValue valueWithCGRect:CGRectMake(appRect.x,appRect.y,appRect.width,appRect.height)]];
        }

    }

return array;
}

This is the OCR process (using Tesseract):

NSMutableArray *arr=[STOpenCV detectLetters:img];

CFTimeInterval totalStartTime = CACurrentMediaTime();
NSMutableString *res=[[NSMutableString alloc] init];

for(int i=0;i<arr.count;i++){
    NSLog(@"\n-------------\nPROCESSING REGION %d/%lu",i+1,(unsigned long)arr.count);

    //Set the OCR region using the result from last step
    tesseract.rect=[[arr objectAtIndex:i] CGRectValue];


    CFTimeInterval startTime = CACurrentMediaTime();

    NSLog(@"Start to recognize: %f",startTime);

    [tesseract recognize];

    NSString *result=[tesseract recognizedText];

    NSLog(@"Result: %@", result);
    [res appendString:result];

    CFTimeInterval elapsedTime = CACurrentMediaTime() - startTime;

    NSLog(@"FINISHED: %f", elapsedTime);
}
FlySoFast
  • 1,854
  • 6
  • 26
  • 47
  • Is this your reference image? Or you have more complex images? Anyhow, post your original image(s), so we can try on them and hopefully come back to you with an accurate answer – Miki Jul 28 '15 at 23:07
  • Thanks @Miki. I added some more images. Basically it could be any images that have text. – FlySoFast Jul 29 '15 at 00:59
  • associating words in a line with each other is a subtask of inferring "document structure". you can do that with nearest-neighbor queries (find nearest box) and associate (graph: nodes, edges) those boxes that are closest and roughly at the same y-coordinate. -- a clever comparison function, given to a generic sorting algorithm, might do the trick as well. it would involve a case for distinguishing lines, and a case for distinguishing position in a line for a pair that is in the same line. -- what's done below looks a little weird but might be equivalent. – Christoph Rackwitz Apr 01 '22 at 12:09

1 Answers1

4

What you want is to sort the array of rects by y position (y - height/2) and then x position (x - width/2) if they are on the same vertical line.

NSArray *sortedRects;
sortedRects = [unsortedRects sortedArrayUsingComparator:^NSComparisonResult(id a, id b) {
    CGRect *first = (CGRect*)a;
    CGRect *second = (CGRect*)b;
CGFloat yDifference = first.y - (first.height / 2.0) < second.y - (second.height / 2.0)
    return (yDifference < 0) || (yDifference == 0 && (first.x - (first.width / 2.0) < second.x) || (second.width / 2.0));
}];
Patrick
  • 1,717
  • 7
  • 21
  • 28
Jozef Legény
  • 1,157
  • 1
  • 11
  • 26
  • 1
    Yeah, but only in the perfect world, because even when the rectangles are "on the same line" to our eyes, they don't always have a same position value – FlySoFast Jul 27 '15 at 07:47
  • 3
    In that case you can add some epsilon value to the yDifference comparison. Instead of checking for < 0 you could check for < 5 (for example) and then (fabs(yDifference) < 5 && ...). – Jozef Legény Jul 27 '15 at 08:37
  • that sounds about right. Actually I thought about it before but I got stuck in finding a perfect epsilon value for all of the text blocks. I thought about using a average value of the blocks' height as epsilon value, but seems like it would be inaccurate if there're s some big blocks among the small ones. It's really close! – FlySoFast Jul 28 '15 at 00:44
  • You can also invert the process and use X first and Y second, in this case you would probably need a bigger epsilon (half of the max-width?) – Jozef Legény Jul 28 '15 at 08:09
  • Otherwise you could use a modified version of the insert sort: Init: Find the rect in the unsorted table (UT) with the lowest Y, put it in the sorted table (ST) Loop: Find the rect with the lowest Y in the UT, name it A. Go through elements B in the ST. If A.y < B.y go to next unless B.y - B.h / 2 < A.y + A.h / 2 && B.x < A.x in which case put B in front of A. If both conditions fail put B after A. This might need some fine tuning. This finds such a position that the inserted rectangle is before all rectangles that are 'higher' but still on the same 'line'. – Jozef Legény Jul 28 '15 at 08:16
  • I think your answer is enough for my case so I will mark it as the answer. You may want to put your comments details to the answer as well for a better answer. Thank you! – FlySoFast Jul 30 '15 at 02:17
  • Thank you. I'll update the answer with more details and an example. – Jozef Legény Jul 30 '15 at 08:45
  • I am porting this solution to work for swift, but noticed there is a type issue in the main answer. Specifically: `(first.height / 2.0 < second.y)`, returns a bool when the full expression returns a CGFloat. Am I missing another operation that this could mean in objective c? – djds23 Mar 19 '22 at 16:01
  • 1
    @djds23 no, there was a missing parenthesis, edited the answer. – Jozef Legény Apr 01 '22 at 06:04
  • @JozefLegény the line with (second.width / 2.0) looks off, perhaps (second.width / 2.0) is missing a comparison? – Patrick May 20 '22 at 20:53