I have done otsu thresholding on this bengali text image and use tesseract to OCR but the output is very bad. What preprocessing should I apply to remove the noise? I want to deskew the image as well, as it has slight skewed. My code is given below
import tesserocr
from PIL import Image
import cv2
import codecs
image = cv2.imread("crop2.bmp", 0)
(thresh, bw_img) = cv2.threshold(image, 128, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)
img = Image.fromarray(bw_img)
text = tesserocr.image_to_text(img, lang='ben')
file = codecs.open("output_text", "w", "utf-8")
file.write(text)
file.close()