4

I am working on a document processing application that generates and reads forms. The sample form attached is generated as a printed document, filled out by people, scanned and fed back to the application to detect filled values including Optical Marks (Bubbles), Text (OCR), etc. Click here for Sample Form.

Since scanning distorts the image in terms of rotation, scale and translation, I use the three markers to detect orientation and correct the image in a rather primitive way that is VERY expensive on computation and memory. Here is the gist of it:

  1. Read in the image from disk.
  2. Detect bolbs using AForge.net.
  3. Filter out the markers using shape, relative size and other properties.
  4. Calculate rotation and rotate image.
  5. Detect bolbs from the rotated image using AForge.net.
  6. Calculate scale and scale rotated image.
  7. Detect bolbs from the scaled image using AForge.net.
  8. Calculate translation and translate rotated, scaled image.
  9. Detect bolbs from the translated image using AForge.net.
  10. Filter out answer marks (bubbles) since I already have the positions of the original form.
  11. Extract mean color and compare to threshold to determine if the option is filled.

The above being an extremely accurate but inefficient way to process, I am looking to take a geometric approach to extracting blobs only ONCE, filtering out markers/bubbles and using simple math to figure out expected positions of bubbles relative to the markers. This should cut down the processing time by 80% and memory usage by 60%.

Alternately, there HAS to be a way to apply all three transformations on a single image without one affecting the next. That would also reduce the need for blob detection thrice.

Raheel Khan
  • 14,205
  • 13
  • 80
  • 168
  • 2
    What specific question are you asking? – Daniel Mann Oct 09 '11 at 05:00
  • I want to figure out the coordinates of bubble markers relative to black markers mathematically instead of having to transform the actual image. – Raheel Khan Oct 09 '11 at 05:40
  • The goal is to read pixel values of those bubble markers to determine whether they have been filled-in by users. – Raheel Khan Oct 09 '11 at 05:41
  • What OCR software do you use and can't you use its output? – Gert Arnold Oct 09 '11 at 14:58
  • That is not possible in my case because not only is text optional for users, it can misguide the application decreasing accuracy. – Raheel Khan Oct 10 '11 at 01:32
  • After more searching, the answer seems to be geometric / matrix transformation, although I am not familiar with that. The idea is to calculate coordinates relative to another given that you know all coordinates before the image was rotated / scaled / translated. – Raheel Khan Jan 05 '12 at 11:32

1 Answers1

0

I would model the image and do the transformations on that model in memory instead of the actual image. Then once you have calculated the transformation matrix you can apply it to the actual image to do the OCR.

justin.m.chase
  • 13,061
  • 8
  • 52
  • 100
  • Thanks. Could you please elaborate on modeling the image? – Raheel Khan May 07 '12 at 03:10
  • I'm sorry, I think what I was saying is that if you could calculate the blobs into a rectangle, then instead of rotating the image you can just apply the rotation to the rectangle object as a matrix transform. Then you should be able to just know where the blobs would will be and you can then calculate the scale and translation that way. Once you have the final transformation matrix just apply that to the image and you should be done. A single transformation to the image is performed. – justin.m.chase May 10 '12 at 20:31
  • Thanks. That is what I am looking for but somehow the results do not appear to be accurate. When I use the apply rotation, scale and translation to the matrix and finally call transform, they appear to be affected by the called sequence. – Raheel Khan May 10 '12 at 23:15
  • yes, the order of transformations makes a big difference. You should probably scale, rotate, then translate. – justin.m.chase May 12 '12 at 06:03