Timeout issue while running python script Phantomjs and Selenium

Question

I am running a python script with Phontomjs and Selenium. I am facing timeout issue. It is stopping after 20-50min. I need a solution so that I can run my script without this timeout issue. where is the problem please and how can I solve it?

 The input file cannot be read or no in proper format.
    Traceback (most recent call last):
      File "links_crawler.py", line 147, in <module>
        crawler.Run()
      File "links_crawler.py", line 71, in Run
        self.checkForNextPages()
      File "links_crawler.py", line 104, in checkForNextPages
        self.next.click()
      File "/home/dev/.local/lib/python2.7/site-packages/selenium/webdriver/remote/webelement.py", line 75, in click
        self._execute(Command.CLICK_ELEMENT)
      File "/home/dev/.local/lib/python2.7/site-packages/selenium/webdriver/remote/webelement.py", line 454, in _execute
        return self._parent.execute(command, params)
      File "/home/dev/.local/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 199, in execute
        response = self.command_executor.execute(driver_command, params)
      File "/home/dev/.local/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 395, in execute
        return self._request(command_info[0], url, body=data)
      File "/home/dev/.local/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 463, in _request
        resp = opener.open(request, timeout=self._timeout)
      File "/usr/lib/python2.7/urllib2.py", line 431, in open
        response = self._open(req, data)
      File "/usr/lib/python2.7/urllib2.py", line 449, in _open
        '_open', req)
      File "/usr/lib/python2.7/urllib2.py", line 409, in _call_chain
        result = func(*args)
      File "/usr/lib/python2.7/urllib2.py", line 1227, in http_open
        return self.do_open(httplib.HTTPConnection, req)
      File "/usr/lib/python2.7/urllib2.py", line 1200, in do_open
        r = h.getresponse(buffering=True)
      File "/usr/lib/python2.7/httplib.py", line 1127, in getresponse
        response.begin()
      File "/usr/lib/python2.7/httplib.py", line 453, in begin
        version, status, reason = self._read_status()
      File "/usr/lib/python2.7/httplib.py", line 417, in _read_status
        raise BadStatusLine(line)
    httplib.BadStatusLine: ''

Code:

class Crawler():
    def __init__(self,where_to_save, verbose = 0):
        self.link_to_explore = ''
        self.TAG_RE = re.compile(r'<[^>]+>')
        self.TAG_SCRIPT = re.compile(r'<(script).*?</\1>(?s)')
        if verbose == 1:
            self.driver = webdriver.Firefox()
        else:
            self.driver = webdriver.PhantomJS()
        self.links = []
        self.next = True
        self.where_to_save = where_to_save
        self.logs = self.where_to_save + "/logs"
        self.outputs = self.where_to_save + "/outputs"
        self.logfile = ''
        self.rnd = 0
        try:
            os.stat(self.logs)
        except:
            os.makedirs(self.logs)
        try:
            os.stat(self.outputs)
        except:
            os.makedirs(self.outputs)

try:
    fin = open(file_to_read,"r")
    FileContent = fin.read()
    fin.close()
    crawler =Crawler(where_to_save)
    data = FileContent.split("\n")
    for info in data:
        if info!="":
            to_process = info.split("|")
            link =     to_process[0].strip()
            category = to_process[1].strip().replace(' ','_')
            print "Processing the link: " + link : " + info
            crawler.Init(link,category)
            crawler.Run()
            crawler.End()
    crawler.closeSpider()
except:
    print "The input file cannot be read or no in proper format."
    raise

Yeah, the question needs some code, or at least something to provide context for the error. But while we wait: you're not using the [Apache Requests library](http://docs.python-requests.org/en/latest/), by any chance, are you? — David Z, Nov 20 '15 at 09:22
@DavidZ , on my virtual machine I installed request module before! — rhb1, Nov 20 '15 at 09:42

Raghav Sharma · Answer 1 · 2015-11-20T19:39:36.927

0

If you don't want Timeout to stop your script you can catch the exception selenium.common.exceptions.TimeoutException and pass it.

You can set the default page load timeout using the set_page_load_timeout() method of webdriver.

Like this

driver.set_page_load_timeout(10)

This will throw a TimeoutException if your page didn't load in 10 seconds.

EDIT: Forgot to mention that you will have to put your code in a loop.

Add import

from selenium.common.exceptions import TimeoutException

while True:
    try:
        # Your code here
        break # Loop will exit
    except TimeoutException:
        pass

edited Nov 20 '15 at 19:39

answered Nov 20 '15 at 09:27

Raghav Sharma

203
1
8

`except selenium.common.exceptions.TimeoutException:` `pass` above code is the last portion of my whole script. should it continue the script rather than raise that timeout exception ? – rhb1 Nov 20 '15 at 11:44
@rhb1 You'll have to put your code in a loop. Look at my updated post. Also, it would help if you could show your code. – Raghav Sharma Nov 20 '15 at 12:58
@rhb1 Need the crawler class. That's where the error occurs. – Raghav Sharma Nov 20 '15 at 13:27
please check my crawler class. added it – rhb1 Nov 20 '15 at 13:44
You said there is a TimeourException at the end of your script. Where is that code? – Raghav Sharma Nov 20 '15 at 13:50
@rhb1 Updated. Try now. – Raghav Sharma Nov 20 '15 at 14:36
I have upadted the code according to your explanation. having same error. please check if the code looks ok now, https://jumpshare.com/v/lELk3wApvLYaAoEqbKGD – rhb1 Nov 20 '15 at 17:25
@rhb1 Your problem isn't even about timeout. The server is returning some bad html. – Raghav Sharma Nov 20 '15 at 19:37
any way to recover this issue ? :( – rhb1 Nov 22 '15 at 18:21
i have already detected those links for which i am getting bad html. Can i block those specific links ? – rhb1 Nov 22 '15 at 18:22

Timeout issue while running python script Phantomjs and Selenium

1 Answers1