I was extracting data from scanned pdf by tesseract ocr and I am able to extract data but the accuracy is not good. At many places, its showing wrong data so can I get data with 100% accuracy by python.
first I convert pdf to jpg format then I extract data from the image using tesseract module.
from PIL import Image
import pytesseract
text=(pytesseract.image_to_string(Image.open(r"C:\Users\sumesh\Desktop\ip\ip\pdf11.jpg")))
text=repr(text)
text=text.replace(r"\n","")
print(text)
I expected proper data from pdf but I am getting different data for eg.z is showing 2,5 is s,1 is I, etc