I am very new to Computer Vision. I have lots of images like this:
I want to extract the entire table as text. I tried pytesseract
to extract text from the image. I tried the sample code as below:
try:
import Image
except ImportError:
from PIL import Image
from pytesseract import *
im = Image.open('/home/Downloads/b.png')
text = image_to_string(im, lang='eng')
print text
But results are really bad. Some sample:
II) Han H31 Precvsva 111
II) Pegalran Corn m
11) Quama camume. m
15) Sansmlg Eledra. KR
II) snaru Corn/Japan 11>
II) 15 msnlay Co 1111 KR
13)]ah1lC1rcuvl Inc us
II) Iaman Semioan... 1w
I1)Japan msulay Inc 11>
I1) Schneider Fleck... 511
II) campal Elec|ram 111
II) 5111-9110 onlme 5. JP
I1) C1500 syaens Inc us
Is) Warned Semic. 111
II) Mvcran Techmla. us
I1) Camnuler Sclenc
I1) Flex Lid us
I111me1 Corn 115
How can I improve the efficiency? Can I achieve 80-90% accuracy? All my images are in same format. So can I improve the accuracy for my use-case? Any suggestions will help.
Update: I tried using OCR.space, but it didn't work on the following image at all: