1

I'm using pytesser on simple images with plain text. It works great! However, in python it prints each line of text on a new line. But the string it outputs has no "\n" or new line delimiters that I can pull out.

How does it print each new line of the image on a new line in the console? And is there a way I can pull out a particular line? or split them myself?

It's more than likely something very simple i'm missing...

from pytesser import *
image = Image.open('image.jpg') 

text =  image_to_string(image)

print len(text)
print text 

Output:

983
BACK RASHER 1.24
T CREAM 250ML 1.19
T COFFEE 200G 1.09
PEANUT BUTTER 1.12
DIET COKE * 2.39
Ciaran
  • 478
  • 1
  • 8
  • 23
  • Note: I know i can search the string for certain things using .split() or .strip() to segment each new line, but I'm wondering whats inherent in the output itself that it knows its a new line? – Ciaran May 27 '15 at 13:42
  • what is the expected output? – ZdaR May 27 '15 at 13:45
  • 2
    Try `print(repr(text))`. You will see whether "\n" characters are present or not. – dlask May 27 '15 at 13:47
  • my .split("\n") didn't work originally for some reason (knowing me I typed it with a forward slash or something). Sorry for such a simple question!! I have never used repr(). Can i assume that it shows you the sting that the interpreter reads? If so, this will be very useful in general! – Ciaran May 27 '15 at 13:51

1 Answers1

1

Thanks to for pointing out my mistake. repr() shows the output as the interpeter sees it, along with the new line "\n" delimiters. Using text.split("\n") I can then split the output up line by line. Thanks dlask!

from pytesser import *
image = Image.open('image.jpg')  # Open image object using PIL

text =  image_to_string(image)     # Run tesseract.exe on image

print(repr(text))
result = text.split("\n")

print result
Ciaran
  • 478
  • 1
  • 8
  • 23