0

I'm going through images and finding "buttons" based on their text. My thought process:

  • Find the text in the image.
  • Find the solid background color near that text.
  • Using the background color from bullet 2, find all continuous pixels matching that color (similar to Photoshop's magic wand tool). This will be my box/button. I then need to store the top left and bottom right coordinates of that box.

So far, I can do bullets 1 and 2. Bullet 3 is eluding me. Any tips on how to accomplish a "magic wand" selection of pixels based on color?

import cv2
import easyocr

img = cv2.imread('C:/Users/xyz/PycharmProjects/findTextImage_v0/testBank/sampleScreen.png')

text = easyocr.Reader(['en'])
result = text.readtext(img)

exit_coord = []
top_left = []

for item in result:
    if "EXIT" in item:
        exit_coord = [item[0], item[1]]
        top_left = exit_coord[0][0]
        print(exit_coord[0][0])
    else:
        print("False")

# Define the button background color at exit coordinates
background_color = img[top_left[1], top_left[0]]
print(background_color)

sample image

mvCode
  • 25
  • 5

1 Answers1

0

I think I've gotten this pretty close (major credit to this solution: https://stackoverflow.com/a/46667829/18270545). I'm happy to see other suggestions on more efficient ways to tackle this.

import cv2
import easyocr

img = cv2.imread('C:/Users/xyz/PycharmProjects/findTextImage_v0/testBank/sampleScreen.png')

text = easyocr.Reader(['en'])
result = text.readtext(img)

exit_coord = []
top_left = []

for item in result:
    if "EXIT" in item:
        exit_coord = [item[0], item[1]]
        top_left = exit_coord[0][0]
        print(exit_coord[0][0])
    else:
        print("False")

# Define the button background color at exit coordinates
background_color = img[top_left[1], top_left[0]]
print(background_color)

# Find button based on background color via flood-fill
h, w, chn = img.shape
seed = top_left
mask = np.zeros((h+2, w+2), np.uint8)

floodflags = 4
floodflags |= cv2.FLOODFILL_MASK_ONLY
floodflags |= (255 << 8)

num, img, mask, rect = cv2.floodFill(img, mask, seed, (255, 0, 0), (10,)*3, (10,)*3, floodflags)

button_top_left = [rect[0], rect[1]]

# Print top left coordinate, width, and height
print(button_top_left)
print([rect[2]])
print([rect[3]])

# Show image
cv2.imshow("final image", mask)
cv2.waitKey(0)
cv2.destroyAllWindows()
mvCode
  • 25
  • 5