0

I am running a python script with Phontomjs and Selenium. I am facing timeout issue. It is stopping after 20-50min. I need a solution so that I can run my script without this timeout issue. where is the problem please and how can I solve it?

 The input file cannot be read or no in proper format.
    Traceback (most recent call last):
      File "links_crawler.py", line 147, in <module>
        crawler.Run()
      File "links_crawler.py", line 71, in Run
        self.checkForNextPages()
      File "links_crawler.py", line 104, in checkForNextPages
        self.next.click()
      File "/home/dev/.local/lib/python2.7/site-packages/selenium/webdriver/remote/webelement.py", line 75, in click
        self._execute(Command.CLICK_ELEMENT)
      File "/home/dev/.local/lib/python2.7/site-packages/selenium/webdriver/remote/webelement.py", line 454, in _execute
        return self._parent.execute(command, params)
      File "/home/dev/.local/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 199, in execute
        response = self.command_executor.execute(driver_command, params)
      File "/home/dev/.local/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 395, in execute
        return self._request(command_info[0], url, body=data)
      File "/home/dev/.local/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 463, in _request
        resp = opener.open(request, timeout=self._timeout)
      File "/usr/lib/python2.7/urllib2.py", line 431, in open
        response = self._open(req, data)
      File "/usr/lib/python2.7/urllib2.py", line 449, in _open
        '_open', req)
      File "/usr/lib/python2.7/urllib2.py", line 409, in _call_chain
        result = func(*args)
      File "/usr/lib/python2.7/urllib2.py", line 1227, in http_open
        return self.do_open(httplib.HTTPConnection, req)
      File "/usr/lib/python2.7/urllib2.py", line 1200, in do_open
        r = h.getresponse(buffering=True)
      File "/usr/lib/python2.7/httplib.py", line 1127, in getresponse
        response.begin()
      File "/usr/lib/python2.7/httplib.py", line 453, in begin
        version, status, reason = self._read_status()
      File "/usr/lib/python2.7/httplib.py", line 417, in _read_status
        raise BadStatusLine(line)
    httplib.BadStatusLine: ''

Code:

class Crawler():
    def __init__(self,where_to_save, verbose = 0):
        self.link_to_explore = ''
        self.TAG_RE = re.compile(r'<[^>]+>')
        self.TAG_SCRIPT = re.compile(r'<(script).*?</\1>(?s)')
        if verbose == 1:
            self.driver = webdriver.Firefox()
        else:
            self.driver = webdriver.PhantomJS()
        self.links = []
        self.next = True
        self.where_to_save = where_to_save
        self.logs = self.where_to_save + "/logs"
        self.outputs = self.where_to_save + "/outputs"
        self.logfile = ''
        self.rnd = 0
        try:
            os.stat(self.logs)
        except:
            os.makedirs(self.logs)
        try:
            os.stat(self.outputs)
        except:
            os.makedirs(self.outputs)

try:
    fin = open(file_to_read,"r")
    FileContent = fin.read()
    fin.close()
    crawler =Crawler(where_to_save)
    data = FileContent.split("\n")
    for info in data:
        if info!="":
            to_process = info.split("|")
            link =     to_process[0].strip()
            category = to_process[1].strip().replace(' ','_')
            print "Processing the link: " + link : " + info
            crawler.Init(link,category)
            crawler.Run()
            crawler.End()
    crawler.closeSpider()
except:
    print "The input file cannot be read or no in proper format."
    raise
rhb1
  • 77
  • 10

1 Answers1

0

If you don't want Timeout to stop your script you can catch the exception selenium.common.exceptions.TimeoutException and pass it.

You can set the default page load timeout using the set_page_load_timeout() method of webdriver.

Like this

driver.set_page_load_timeout(10)

This will throw a TimeoutException if your page didn't load in 10 seconds.

EDIT: Forgot to mention that you will have to put your code in a loop.

Add import

from selenium.common.exceptions import TimeoutException

while True:
    try:
        # Your code here
        break # Loop will exit
    except TimeoutException:
        pass
Raghav Sharma
  • 203
  • 1
  • 8