16

I would like to implement an OCR application that would recognize text from Photos.

I succeeded in Compiling and Integration the Tesseract Engine in iOS, I succeeded in getting reasonable detection when photographing clear documents (or a photoshot of this text from the screen) but for other text such as signposts, shop signs, colour background, the detection failed.

The Question is What kind of image processing preparations are necessary to get better recognition. For example, I expect that we need to transform the images into grayscale /B&W as well as fixing contrast etc.

How can this be done in iOS, Is there a package for this?

alandalusi
  • 1,145
  • 4
  • 18
  • 39

2 Answers2

16

I'm currently working on the same thing. I found that a PNG saved in photoshop worked fine, but an image which was originally sourced from the camera then imported into the app never worked. Don't ask me to explain it - but applying this function made these images work. Maybe it'll work for you too.

// this does the trick to have tesseract accept the UIImage.
UIImage * gs_convert_image (UIImage * src_img) {
    CGColorSpaceRef d_colorSpace = CGColorSpaceCreateDeviceRGB();
    /*
     * Note we specify 4 bytes per pixel here even though we ignore the
     * alpha value; you can't specify 3 bytes per-pixel.
     */
    size_t d_bytesPerRow = src_img.size.width * 4;
    unsigned char * imgData = (unsigned char*)malloc(src_img.size.height*d_bytesPerRow);
    CGContextRef context =  CGBitmapContextCreate(imgData, src_img.size.width,
                                                  src_img.size.height,
                                                  8, d_bytesPerRow,
                                                  d_colorSpace,
                                                  kCGImageAlphaNoneSkipFirst);

    UIGraphicsPushContext(context);
    // These next two lines 'flip' the drawing so it doesn't appear upside-down.
    CGContextTranslateCTM(context, 0.0, src_img.size.height);
    CGContextScaleCTM(context, 1.0, -1.0);
    // Use UIImage's drawInRect: instead of the CGContextDrawImage function, otherwise you'll have issues when the source image is in portrait orientation.
    [src_img drawInRect:CGRectMake(0.0, 0.0, src_img.size.width, src_img.size.height)];
    UIGraphicsPopContext();

    /*
     * At this point, we have the raw ARGB pixel data in the imgData buffer, so
     * we can perform whatever image processing here.
     */


    // After we've processed the raw data, turn it back into a UIImage instance.
    CGImageRef new_img = CGBitmapContextCreateImage(context);
    UIImage * convertedImage = [[UIImage alloc] initWithCGImage:
                                 new_img];

    CGImageRelease(new_img);
    CGContextRelease(context);
    CGColorSpaceRelease(d_colorSpace);
    free(imgData);
    return convertedImage;
}

I've also gone a lot of experimentation preparing the image for tesseract. Resizing, converting to grayscale, then adjusting brightness and contrast seems to work best.

I've also tried this GPUImage library. https://github.com/BradLarson/GPUImage And the GPUImageAverageLuminanceThresholdFilter seems to give me a great adjusted image, but tesseract doesn't seem to work well with it.

I've also put in opencv into my project and plan to try out it's image routines. Possibly even some box detection to find the text area (i'm hoping this will speed up tesseract).

roocell
  • 2,429
  • 25
  • 28
  • After adding this gs_convert_image() also I am getting same result before putting this method. Is there any way to improve the accuracy of tessaract scanning data? – Shanmugaraja G Oct 08 '14 at 05:06
  • 1
    Were you ever able to figure out why OCR would work on saved images but not on images from the camera? I'm having the same issue now, but I'm working in Swift and don't know how to implement your above code. I just posted about it here http://stackoverflow.com/questions/29336501/tesseract-ocr-w-ios-swift-returns-error-or-gibberish then found your answer. Seems related. Any ideas? – Andrew Mar 30 '15 at 02:12
  • @Andrew Did you write this code in swift3. If you done then add that one as your answer for this question. It will helpful to me –  Dec 28 '16 at 13:59
10

I have used the code above but added two other function calls as well to convert the image so that it will work with the Tesseract.

Firstly I used an image resize script to convert to 640 x 640 which seems to be more manageable for the Tesseract.

-(UIImage *)resizeImage:(UIImage *)image {

    CGImageRef imageRef = [image CGImage];
    CGImageAlphaInfo alphaInfo = CGImageGetAlphaInfo(imageRef);
    CGColorSpaceRef colorSpaceInfo = CGColorSpaceCreateDeviceRGB();

    if (alphaInfo == kCGImageAlphaNone)
        alphaInfo = kCGImageAlphaNoneSkipLast;

    int width, height;

    width = 640;//[image size].width;
    height = 640;//[image size].height;

    CGContextRef bitmap;

    if (image.imageOrientation == UIImageOrientationUp | image.imageOrientation == UIImageOrientationDown) {
        bitmap = CGBitmapContextCreate(NULL, width, height, CGImageGetBitsPerComponent(imageRef), CGImageGetBytesPerRow(imageRef), colorSpaceInfo, alphaInfo);

    } else {
        bitmap = CGBitmapContextCreate(NULL, height, width, CGImageGetBitsPerComponent(imageRef), CGImageGetBytesPerRow(imageRef), colorSpaceInfo, alphaInfo);

    }

    if (image.imageOrientation == UIImageOrientationLeft) {
        NSLog(@"image orientation left");
        CGContextRotateCTM (bitmap, radians(90));
        CGContextTranslateCTM (bitmap, 0, -height);

    } else if (image.imageOrientation == UIImageOrientationRight) {
        NSLog(@"image orientation right");
        CGContextRotateCTM (bitmap, radians(-90));
        CGContextTranslateCTM (bitmap, -width, 0);

    } else if (image.imageOrientation == UIImageOrientationUp) {
        NSLog(@"image orientation up");

    } else if (image.imageOrientation == UIImageOrientationDown) {
        NSLog(@"image orientation down");
        CGContextTranslateCTM (bitmap, width,height);
        CGContextRotateCTM (bitmap, radians(-180.));

    }

    CGContextDrawImage(bitmap, CGRectMake(0, 0, width, height), imageRef);
    CGImageRef ref = CGBitmapContextCreateImage(bitmap);
    UIImage *result = [UIImage imageWithCGImage:ref];

    CGContextRelease(bitmap);
    CGImageRelease(ref);

    return result;
}

So that the radians work ensure you declare it above the @implementation

static inline double radians (double degrees) {return degrees * M_PI/180;}

Then I convert to grayscale.

I found this article Convert image to grayscale on converting to grayscale.

I have used the code from here successfully and can now read different colour text and different colour backgrounds

I have modified the code slightly to work as a function within a class rather than as its own class which the other person did

- (UIImage *) toGrayscale:(UIImage*)img
{
    const int RED = 1;
    const int GREEN = 2;
    const int BLUE = 3;

    // Create image rectangle with current image width/height
    CGRect imageRect = CGRectMake(0, 0, img.size.width * img.scale, img.size.height * img.scale);

    int width = imageRect.size.width;
    int height = imageRect.size.height;

    // the pixels will be painted to this array
    uint32_t *pixels = (uint32_t *) malloc(width * height * sizeof(uint32_t));

    // clear the pixels so any transparency is preserved
    memset(pixels, 0, width * height * sizeof(uint32_t));

    CGColorSpaceRef colorSpace = CGColorSpaceCreateDeviceRGB();

    // create a context with RGBA pixels
    CGContextRef context = CGBitmapContextCreate(pixels, width, height, 8, width * sizeof(uint32_t), colorSpace,
                                                 kCGBitmapByteOrder32Little | kCGImageAlphaPremultipliedLast);

    // paint the bitmap to our context which will fill in the pixels array
    CGContextDrawImage(context, CGRectMake(0, 0, width, height), [img CGImage]);

    for(int y = 0; y < height; y++) {
        for(int x = 0; x < width; x++) {
            uint8_t *rgbaPixel = (uint8_t *) &pixels[y * width + x];

            // convert to grayscale using recommended method:     http://en.wikipedia.org/wiki/Grayscale#Converting_color_to_grayscale
            uint32_t gray = 0.3 * rgbaPixel[RED] + 0.59 * rgbaPixel[GREEN] + 0.11 * rgbaPixel[BLUE];

            // set the pixels to gray
            rgbaPixel[RED] = gray;
            rgbaPixel[GREEN] = gray;
            rgbaPixel[BLUE] = gray;
        }
    }

    // create a new CGImageRef from our context with the modified pixels
    CGImageRef image = CGBitmapContextCreateImage(context);

    // we're done with the context, color space, and pixels
    CGContextRelease(context);
    CGColorSpaceRelease(colorSpace);
    free(pixels);

    // make a new UIImage to return
    UIImage *resultUIImage = [UIImage imageWithCGImage:image
                                             scale:img.scale
                                       orientation:UIImageOrientationUp];

    // we're done with image now too
    CGImageRelease(image);

    return resultUIImage;
}
Community
  • 1
  • 1
Adam Richardson
  • 2,518
  • 1
  • 27
  • 31
  • i have been trying this, and my images are converted, however, the UIImage still crashes on my iPhone. Any suggestions? Can you provide your source code? – Tha Leang May 01 '13 at 02:26
  • 1
    Are you returning an image from the camera or are you loading it from another source? Also the code I have provided above assumes that you are using ARC, if you are not then you will need to release the image and other objects at the appropriate time otherwise you will get crashes due to the memory load. – Adam Richardson May 01 '13 at 08:50
  • "image.imageOrientation == UIImageOrientationUp | image.imageOrientation == UIImageOrientationDown" ? – pronebird Nov 01 '14 at 14:05
  • I'm trying the above code and I'm getting "Use of undeclared identifier radians". – Daniel P Nov 22 '14 at 11:42
  • 1
    @daniel-p ensure you have math.h included. Then before the implementation in the viewController add the following: static inline double radians (double degrees) {return degrees * M_PI/180;} – Adam Richardson Nov 22 '14 at 18:23
  • @Adam Richardson am i changed the code according to you but still not getting accurate – Aruna kumari Dec 10 '14 at 12:04