8

I had an image file, which contain some text separated by tabs (2 spaces). But when I extract text out of this image file, I always get a single space between two columns. A sample example:

IMAGE:

col-a    col-b    col-c

Desired output:

col-a    col-b    col-c

But I am getting the following:

col-a col-b col-c

I am using pytesseract.image_to_string (Python module) convert image to text

Zoe
  • 27,060
  • 21
  • 118
  • 148
raghu
  • 384
  • 7
  • 10

1 Answers1

11

Use it like this:

pytesseract.image_to_string(img, config='-c preserve_interword_spaces=1')
Zoe
  • 27,060
  • 21
  • 118
  • 148
  • 1
    I tried this but am getting the same output in both the cases - preserve_interword_spaces=1 and preserve_interword_spaces=0 – Resham Wadhwa Jan 17 '19 at 07:47
  • `preserve_interword_spaces=1` is not available anymore with tesseract 4. May be it will be fixed later. – singrium Apr 11 '19 at 15:57
  • Hi guys, have you got this fixed? – Marcelo Gazzola Nov 22 '19 at 22:03
  • This has been fixed. See https://github.com/tesseract-ocr/tesseract/issues/781 – flipchart May 07 '20 at 12:35
  • 2
    What is the difference between tesseract and pytesseract? When I download pytesseract the latest version is 0.3.7. When I download tesseract the latest version is 0.1.3. At github they're saying that there is a version 4. How do I get version 4 and how do I get python to use it? – bobsmith76 Dec 19 '20 at 03:41