Unable to extract text from those images

Question

I tried to detect and extract text from the below images, but I am not able to get the header text properly.

Image 1:

Image 2:

For those kinds of images, I am unable to detect and extract text from it. Please help me with those images.

I tried the below code:

import cv2
import pytesseract    
pytesseract.pytesseract.tesseract_cmd = "C:\\Program Files\\Tesseract-OCR\\tesseract.exe"    
# Load image and threshold
image = cv2.imread(r"C:\Users\Admin\Downloads\Table_result\Table_result\semi train\Caffia 
coffee_before.jpg")
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]

# Connect text with a horizontal shaped kernel
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (10, 3))
dilate = cv2.dilate(thresh, kernel, iterations=1)

# Remove non-text contours using aspect ratio filtering
cnts = cv2.findContours(dilate, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
    x, y, w, h = cv2.boundingRect(c)
    aspect = w / h
    if aspect < 3:
        cv2.drawContours(thresh, [c], -1, (0, 0, 0), -1)

# Invert image and OCR
result = 255 - thresh
data = pytesseract.image_to_string(result, config='-l eng --oem 2 --psm 6')
print(data)

Result for my code: For Image 1:

FD Product Cust. Prod. Product Description Pack Size Qty Weight Unit Line 
Value V
Code Code Price
bl [SnR] o1 Each 1.00 £0.00 FX0]
ISR XA oY) Pack 10 | 1.00 [N £2.05
2350500 Chillies Green 1x500 gm| 1.00 £3.13 £3.13

For Image 2:

POR245 Caffia Alliance RFA FD Coffee 3 Pint Sachets (x 120) 30 61.40 
1,842.00 0.00

For Image 1, I am getting header result but the content was not extracted well. For Image 2, The header section will not get extracted, but the content data extracted well.

What did you get from your code? And what have you researched to improve it? — Phung Duy Phong, Jan 13 '20 at 10:52
Please check my updated question. Here I will give the result from my code. Please give some suggestions. — Vijay, Jan 13 '20 at 11:22
Ok, you need to approach it in different way, I'm not familiar with opencv2 but let me try — Phung Duy Phong, Jan 13 '20 at 11:25

Unable to extract text from those images

0 Answers0