Finding bounding box of text within JPG image

Question

My question is similar to this one, but is more specific in scope.

In my card game application, I would like for users to be able to click on words located in a scanned jpeg image. Please see this sample Pokemon trading card.

In this case, the user should be able to hover his mouse over the text "Scratch", upon which a pulsing rectangular border will appear around the text, indicating that it is clickable. The problem is how to detect the border of the text. There will be an array of words KNOWN BEFOREHAND that the user may click on (these will be retrieved from a database on a card-by-card basis). To continue our example, the array in this case will be ["Scratch", "Live Coal"]. Once the user clicks on "Scratch", the application must know via a call-back that "Scratch" was chosen instead of "Live Coal".

I was thinking of using optical character recognition libraries to solve this problem, but the open-source options for this are poor in quality (e.g. GOCR) and/or not well-tested on multiple platforms (e.g. Tesseract). I only care about Windows and Mac compatibility. Am I missing an obvious/simpler solution/algorithm that does not require OCR? I cannot simply hand-code in bounding boxes for each card, as there will be thousands of scanned cards in my database. The user may also upload his own custom card scans with an accompanying array of clickable text.

Text color is not always black. See this panorama of different card and text styles that will be permitted. The black cards have white text, and the third-to-last card (Zekrom) has black text with a white outline.

Solutions in any programming language are appreciated. However, please note that I am looking for open-source algorithms and/or libraries. If there is a solution in Ruby or Java, even better, as my code is primarily in these two languages.

EDIT: I forgot to mention that the order of the words/phrases in the array will be the same as on the card. Thus, the array will be ["Scratch", "Live Coal"] instead of ["Live Coal", "Scratch"]. I am mentioning this because it can potentially simplify the task. Thus, for this example, I can simply look for black pixels (though I have to watch out for the black star in the white circle). However, there will be more difficult cases where there is descriptive text under the attack name in a smaller font (again, see the panorama for examples).

If you allow users to upload images **and** the accompanying text, how are you planning on verifying that what they type in actually matches the text? Also, will orientation of the card matter (what happens if a card is shown sideways, or at some odd angle)? What you may wish to do is blank out the existing text, then *generate* text in place - easier to generate bounding boxes, easier to translate (as necessary), no (or little, depending) OCR. Only real image procesing to do is determining the bounding box of the 'move' section (which you may want to do regardless, to pre-limit OCR area). — Clockwork-Muse, Jul 14 '11 at 21:22
@X-Zero Thanks, this sounds like a feasible solution. How would you suggest I detect the bounding box of the 'move' section? — Stephen Lam, Jul 14 '11 at 21:53
Depending on what other stuff is supposed to be clickable, it's basically everything that isn't the main picture - all of which (seem to) have a fairly well defined border. There's also a horizontal bar at about the card's midpoint, which seems to have decent contrast (sorry, I've never done image analysis). In any case, if the users can upload (and presumably make) their own cards, you may just wish to make a card creator, where they can type in their own text, and include some sort of custom image of the pokemon; that should be even simpler. At the moment, you have to recognize the card.. — Clockwork-Muse, Jul 14 '11 at 22:52

score 1 · Answer 1 · answered Jul 14 '11 at 20:16

1

I would just write a program that allows you to visually draw a bounding box around your text for simplicity but could could do this buy detecting differences in pixel color. Since the text is black you could see where the upper-left most black pixel is without large indents and within the bottom half of the card.

answered Jul 14 '11 at 20:16

Stas Jaro

4,747
5
31
53

A problem with this is that the card formats vary greatly. I also need to recognize the phrases "Pokemon Power: Matter Exchange" and "Mind Shock" on this [card](http://pokebeach.com/scans/team-rocket/39-dark-kadabra.jpg) – Stephen Lam Jul 14 '11 at 20:47

score 0 · Answer 2 · answered Jul 14 '11 at 20:23

0

When the cursor is stationary, check if there is a black pixel either underneath or to 4 pixels around the cursor. If it is, check the first three consecutive (because there still might be a non-black pixel between the letters) non-black pixels to the left of the cursor, to the right, to the top and at the bottom. If yes, use these locations to draw a square. You can use OpenCV.

answered Jul 14 '11 at 20:23

TookTheRook

817
4
14
31

Good thinking, but a single rectangular border should be drawn around the entire word or phrase. For example, in the OpenCV page you linked to, there is a red rectangle around the "Add <...>\OpenCV\bin to the system PATH" in the first figure. – Stephen Lam Jul 14 '11 at 20:29
How about this. You draw transparent rectangles around each text area beforehand. So for each card object, have an associated arrayList of rectangles, which contain the coordinates of the rectangles. Now, whenever a user moves the mouse, do a check of whether the user is within one of those 'hidden' rectangles. If he/she is, just change the color of that rectangle to black or something? – TookTheRook Jul 14 '11 at 20:32
If you do not know of the cards before hand, just save images of the pre-defined words, such as 'scratch' and when a user scans a new card - check if the card contains any of the images you have stored. If it does, you can detect where the image is and draw a line around it... – TookTheRook Jul 14 '11 at 20:38
That may work, but I do not make the cards and therefore do not have the fonts of the text I am trying to recognize. Besides, how would I check if the card contains an image? There is the background texture to deal with. – Stephen Lam Jul 14 '11 at 20:42
In OpenCV, you can convert images to black and white or grey scale. That makes comparing images easier. So you wont have to worry about the background then. This sounds like a cool idea though: Playing pokemon cards online with people, using your webcam to show the cards or something. – TookTheRook Jul 14 '11 at 20:49
About the transparent rectangles idea, I don't draw the cards or know the coordinates of the rectangles; instead, I am trying to calculate the coordinates of the rectangles through image processing. The physical cards are scanned in. – Stephen Lam Jul 14 '11 at 20:50

Finding bounding box of text within JPG image

2 Answers2