I am trying to extract emails from screenshots.
This is the image- Image with email
You can see in this image, there is an email.
This is my code-
image = cv2.imread('image_name.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
thresh = 255 - thresh
text = pytesseract.image_to_string(thresh, config = '--psm 6')
Tried everything from grayscale to thresholding to inverse but nothing seems to work.
Earlier, it was detecting 5 as 's' and 1 as 'i', but after pre-processing the image as shown above, only the problem with 5 is resolved, but now detects 1 as 't'. Please help.
Tried every pre-processing technique I could find.
Edit 1 : First of all, I am a complete beginner, so I might say something that may be completely childish in programming world. So, please bear with me.
These are some of the results of image_to_data function on the image- email email string itself & contact yet
I would have posted the result of pre-processing image but it shows this error when I am trying to run cv2.imshow()
-
The function is not implemented. Rebuild the library with Windows, GTK+ 2.x or Cocoa support. If you are on Ubuntu or Debian, install libgtk2.0-dev and pkg-config, then re-run cmake or configure script in function 'cvShowImage'
I am running jupyter notebook on Anaconda, that could be the reason of this error.
Here is the image after processing- Image After Processing