2

I am trying to convert the attached OCR JPEG file to text. When I use pytesseract or tesseract, I am seeing diacritics because of which my output contains a lot of junk characters. Also, conversion of jpeg to text is not working.

I tried to read from the image file, extract text, and print using keystrokes. The output is not as expected.

The code is as follows:

image=Image.open('8001.jpg')
text = image_to_string(image, lang='eng')
keyboard.write(text)

I am getting some unwanted characters like these:

>) ) 7? ) 7 0 Daybreak: appeared. Ihe mowing miosls ourvounded us, bub Urey 2001 cleared ch J Wea

> pm 0. 0 ) ) aeaboul lo examine the hull, which formed on deely a kind of horizontal 2

fatfoun, w fen a J felt ils op nel, kicking the resounding plate. “Open,

) me " 57 gradually sinking. Oh! confound i! cried Nod

0 Q yi you inhoapitable zasealy!

Says Pp iy ui

0 0 cide, came from the interior of the Boal. One iton plate was moved, a men appeared, ullered

Matthias Braun
  • 32,039
  • 22
  • 142
  • 171

0 Answers0