4

I'm writing a script that work with tesseract-ocr. I get text from screen and then I need to compare it with a string. The problem is that the comparison fails even if I'm sure that the strings are the same.

How can I made my code works?

Here my code:

import pyscreenshot as pss
import time
from pytesser import image_to_string

buy=str("VENDI")
buyNow=str("VENDI ADESSO")
if __name__ == '__main__':
    while 1:
        c=0

        time.sleep(2)
        image=pss.grab(bbox=(1104,422,(1104+206),(422+30)))
        text = str(image_to_string(im))
        print text
        if text==buy or text==buyNow:
            print 'ok'

For example as input:

Input image sample

And as output I get:

VENDI ADESSO

Which is the same string I need to compare, but during the execution I don't get ok on the console?

Matze
  • 5,100
  • 6
  • 46
  • 69
Marco
  • 98
  • 1
  • 1
  • 10
  • please, share some input... – BeerBaron Aug 29 '17 at 10:13
  • Would you mind giving a sample output? – Eduard Aug 29 '17 at 10:13
  • 2
    Try printing `repr(text)` and see if there is anything unexpected in there. – khelwood Aug 29 '17 at 10:14
  • Try: `print text, len(text)`, to see if there aren't any "hidden" chars (e.g. _space_ or _eoln_). – CristiFati Aug 29 '17 at 10:15
  • Please use meaningful names for your variables and your functions. It's easier to understand your point. – Right leg Aug 29 '17 at 10:17
  • What do you mean by "the strings are the same"? Are your two python objects exactly the same? Or do you just know that the input image should be the same as the string? – Arthur Spoon Aug 29 '17 at 10:18
  • Ok I edit them in english, they are in Italian :). I know that they are the same. I've added some input and output sample – Marco Aug 29 '17 at 10:25
  • Check if the string is unicode and also check the length of string. text.strip() to eliminate the extra spaces. You are trying to take screenshot there might be more strings then you expect. – Bhuvan Kumar Aug 29 '17 at 10:27
  • I check if the strings are both unicode and they are. I also try to eliminate extra spaces but it doesn't work – Marco Aug 29 '17 at 10:39
  • 1
    @khelwood I try to use repr(text) and I get: VENDI ADESSO\n\n – Marco Aug 29 '17 at 10:45
  • Your string has two new lines (`\n`) at the end. You can use `text = text.strip()` to remove any surrounding whitespace from your strings. – khelwood Aug 29 '17 at 10:48
  • Ok perfect now it work. I was using strip() in an incorrect way – Marco Aug 29 '17 at 10:58
  • For me the repr(text) was helpful. I had \ufeff character at start aka ZERO WIDTH NO-BREAK SPACE - and strip() didn't get rid of that either. So I ended up chaining this... `.strip().replace('\ufeff','')` – JGFMK Jun 01 '23 at 13:26

1 Answers1

14

As it turns out, your string has new-lines (\n\n) at the end.

You can use

text = text.strip()

to remove any surrounding whitespace from your string.

khelwood
  • 55,782
  • 14
  • 81
  • 108
  • this happens with certain csv files. need to use strip() as shown above to compare. they look EXACTLY the same but the csv has a new line – smoore4 Jan 20 '21 at 21:17