0

I'm trying to read some entries from a table of data filled with a name and then columns of numbers. Here's the original picture:

enter image description here

Between binarizing, converting to black/white, and just inverting, I found that inverting the image led to the best results.

 image = PIL.ImageOps.invert(image

This lets me process roughly 90%+ of the columns I have as I scroll down to more images, but I'm still failing on a bunch of them. Sometimes, the parenthesis in the columns merge the two numbers I have in each column. Is there any way I can fix issues with parenthesis being mixed with numbers, or maybe remove all of the green text?

abrarisme
  • 495
  • 1
  • 6
  • 14

1 Answers1

0

Resizing the image seemed to be the option that fixed the problems.

image = pyautogui.screenshot(region=(550, 354, 964, 552))
width, height = image.size
image = image.resize((args.resize*width, args.resize*height))

I resized to at least 3x the original size. I guess that increased the distance between characters, making it simpler to recognize the end of one digit and the parenthesis that followed.

Alternatively, the following is an even larger improvement:

image = cv2.imread(output)
image = cv2.bitwise_not(image)
image = cv2.resize(image, None, fx=1.5, fy=1.7,
                               interpolation=cv2.INTER_CUBIC)  # scale
cv2.imwrite(output, image

The scaling is not linear and if you skew it a bit it works better.

abrarisme
  • 495
  • 1
  • 6
  • 14