0

I'm running a program that uses webbot in Python 3.x - the job of the program I'm writing, amongst other things, is to look for certain text patterns in a page full of text, and generate results based on if it finds a match or not. So I have a bunch of lines that look kind of like this:

if web.exists("hi I'm some text", loose_match=False) == True:
    ifoundthetext = 1
else:
    ifoundthetext = 0

This DOES work, however it takes about a second or two per individual search. I have a lot of different texts I need to search, and by the time it gets through all of them the program is at about the 10 second mark per cycle of the program. To make matters worse the longer I leave the program running (especially the compiled .exe version), the slower it gets (up to over a minute if the program has been running overnight). Yes I am using garbage collection.

I've tried combining text results using "or" statements, like this:

if web.exists("hi I'm some text", loose_match=False) == True or web.exists("hi I'm some other text", loose_match=False) == True:

But this has no effect on the program's speed. How can I make this thing go a bit quicker, but without ditching webbot?

츄 plus
  • 488
  • 1
  • 7
  • 20
  • Can you seach using id or a selector, something like, web.exists(id = 'elementid'); – Madison Courto Oct 03 '19 at 05:12
  • Just commenting that while you can greatly simplify this code (i.e. make `ifoundthetext` a Boolean; it's more appropriate than an `int`: `ifoundthetext = web.exists("hi I'm some text", loose_match=False)` to replace everything), the overhead of the extra code is negligible. The bulk of the time is spent searching the text. I don't know how the library is implemented, but it would be helpful for you to include your full code to see if there's a possible logical improvement. – iz_ Oct 03 '19 at 05:14
  • I can't use element IDs in this case, because there's several places that the text could appear in the page and the IDs won't be consistent. – 츄 plus Oct 03 '19 at 05:18
  • Logic of the rest of the program shouldn't matter, because if I remove these lines the rest of the program runs exceptionally quickly. It's just the searching for the text itself that is slow. So either I'm using a sub-optimal way of searching, or there's something actually inside webbot.py that could be improved (I hope). Webbot is here: https://github.com/nateshmbhat/webbot – 츄 plus Oct 03 '19 at 05:21
  • Looking at the source code of `webbot`, it's littered with bad coding style. However, if you're stuck with using it, try utilizing short circuiting if `ifoundthetext` should be `True` if *any* text in a predefined set is found. You can do this with the `any` function combined with a generator expression. – iz_ Oct 03 '19 at 05:50
  • Can you show me what that looks like? I assume I create a list like `listofStuff = ["stuff","things"]` but wouldn't the use of `any` negate the need for a `for` loop? – 츄 plus Oct 03 '19 at 23:19
  • The other thing I have to consider is that it's not enough for me to know that a match was found, I want to know exactly which chunk of text was a match. Also, "any" short-circuits, but I want the loop to keep going because if there's a second bit of text that also generates a match, I want to know about that too, and what it is. – 츄 plus Oct 04 '19 at 03:14

1 Answers1

0

Comments have made it clear that the deficiency is in Webbot itself, which is just too slow and not up to the task of all that repetitive searching.

I've reluctantly rewritten the entire program in Selenium, but I'm glad I did as it performs much faster!

Also used booleans just because of neatness.

The equivalent code in my Selenium version:

try:
    driver.find_element_by_partial_link_text("hi I'm some text")
    ifoundthetext = True
    print ('yay')
except:
    ifoundthetext = False
츄 plus
  • 488
  • 1
  • 7
  • 20