Guidance on how to filter out the notices from the IMAGE before OCR

Question

I have being working on OCR the image by python for a while but there are still rooms to be improved so your input and thoughts will be helpful.

This is what I am currently doing and the ratio of successfully getting a valid ocrText output is around 15%.

ocrImage = cv2.imread(imgName)
ocrImage  = cv2.resize(ocrImage, None, fx=3, fy=3, interpolation=cv2.INTER_LINEAR)  # enlarge 3 times
ocrImage = cv2.cvtColor(ocrImage, cv2.COLOR_BGR2GRAY)  # turn into gray
ret,ocrImage = cv2.threshold(ocrImage,127,255,cv2.THRESH_BINARY)  # conver to balck and white
ocrImage = cv2.morphologyEx(ocrImage, cv2.MORPH_OPEN, np.ones((4,4),np.uint8))  # eliminate the noice 
ocrImage = cv2.morphologyEx(ocrImage, cv2.MORPH_CLOSE, np.ones((4,4),np.uint8))  # make supplement in white dots
cv2.imwrite(ImageName, ocrImage)
ocrText = ocrTool.image_to_string(Image.open(ImageName), builder=pyocr.builders.TextBuilder())

When I tried to make progress of it, I found a 'opencv-color-spaces' blog which uses below code to draw the pixels of the image into a 3d model. I can see all the background noises are in different gray colors and are pretty much in a certain area. I felt this can help me to filter them out before doing it in my code but I have no idea how to do it.

nemo0 = cv2.imread(ImageName1,1)
nemo1 = cv2.cvtColor(nemo0, cv2.COLOR_BGR2RGB)
r, g, b = cv2.split(nemo1)
fig = plt.figure()
axis = fig.add_subplot(1, 1, 1, projection="3d")
pixel_colors = nemo1.reshape((np.shape(nemo1)[0] * np.shape(nemo1)[1], 3))
norm = colors.Normalize(vmin=-1.0, vmax=1.0)
norm.autoscale(pixel_colors)
pixel_colors = norm(pixel_colors).tolist()
axis.scatter(r.flatten(), g.flatten(), b.flatten(), facecolors=pixel_colors, marker=".")
axis.set_xlabel("Red")
axis.set_ylabel("Green")
axis.set_zlabel("Blue")
currentFig1 = plt.gcf()
currentFig1.savefig(ImageName1.replace(Path, pltPath))

I'd like to see for help if you can give me some input on is there a function and do it faster or some code and quickly remove the gray lines before I proceed to process the image?

The example image is in the link here

Adding the original input image would help! An approach is to preprocess the image before putting into OCR such as tesseract — nathancy, Aug 08 '19 at 20:08
Thanks for all your comments. The original input image was in the left-upper corner of the 3D image in the very last line as a link of my question. — Alvin Lin, Aug 09 '19 at 02:34

Guidance on how to filter out the notices from the IMAGE before OCR

0 Answers0