I have Tesseract
running in python via pytesseract
.
Using a image of a newspaper article which happens to contain around 600 words, the pytesseract.image_to_string
function takes around 20 seconds to complete.
The eventual results are great, but it is of little use with it being so slow.
The image has a file size of 3.5MB and a resolution 3024 × 4032 (in case it is useful). It has had preprocessing completed on it via opencv
.
The approx 18 to 20 seconds time period is the case both running on my local machine, and also when uploaded to the Google Cloud platform.
Is there anything that anyone can recommend to speed up this process?
The pytesseract
version used is 0.2.5.