pytesser - next line of text in image?

Question

I'm using pytesser on simple images with plain text. It works great! However, in python it prints each line of text on a new line. But the string it outputs has no "\n" or new line delimiters that I can pull out.

How does it print each new line of the image on a new line in the console? And is there a way I can pull out a particular line? or split them myself?

It's more than likely something very simple i'm missing...

from pytesser import *
image = Image.open('image.jpg') 

text =  image_to_string(image)

print len(text)
print text

Output:

983
BACK RASHER 1.24
T CREAM 250ML 1.19
T COFFEE 200G 1.09
PEANUT BUTTER 1.12
DIET COKE * 2.39

Note: I know i can search the string for certain things using .split() or .strip() to segment each new line, but I'm wondering whats inherent in the output itself that it knows its a new line? — Ciaran, May 27 '15 at 13:42
Try `print(repr(text))`. You will see whether "\n" characters are present or not. — dlask, May 27 '15 at 13:47
my .split("\n") didn't work originally for some reason (knowing me I typed it with a forward slash or something). Sorry for such a simple question!! I have never used repr(). Can i assume that it shows you the sting that the interpreter reads? If so, this will be very useful in general! — Ciaran, May 27 '15 at 13:51

score 1 · Accepted Answer · answered May 27 '15 at 15:21

Thanks to dlask for pointing out my mistake. repr() shows the output as the interpeter sees it, along with the new line "\n" delimiters. Using text.split("\n") I can then split the output up line by line. Thanks dlask!

from pytesser import *
image = Image.open('image.jpg')  # Open image object using PIL

text =  image_to_string(image)     # Run tesseract.exe on image

print(repr(text))
result = text.split("\n")

print result

pytesser - next line of text in image?

1 Answers1