2

I'm using dryscrape to scrape some HTML data from different pages. It's all part of a django application, but I found this problem appears while using python shell as well. Problem with second connection. I'm using:

Python 2.7.6 (default, Mar  4 2014, 13:14:52) 
dryscrape Version: 0.9
webkit-server Version: 1.0
xvfbwrapper Version: 0.2.5

Below you can see way how I would like to use it

Python 2.7.6 (default, Mar  4 2014, 13:14:52) 
Type "copyright", "credits" or "license" for more information.

IPython 2.1.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: import dryscrape

In [2]: from xvfbwrapper import Xvfb

In [3]: x = Xvfb()

In [4]: x.start()

In [5]: session = dryscrape.Session(base_url='http://google.com')

In [6]: session.visit('')

In [7]: session.url()
Out[7]: u'http://www.google.pl/?gfe_rd=cr&ei=d95qVvLfFc2v8wfamoG4Aw'

In [8]: x.stop()

Everything's fine for now. But if I try to continue, with another session

...
In [8]: x.stop()

In [9]: x = Xvfb()

In [10]: x.start()

In [11]: session = dryscrape.Session(base_url='http://google.com')
---------------------------------------------------------------------------
error                                     Traceback (most recent call last)
<ipython-input-11-6cbe39a8459d> in <module>()
----> 1 session = dryscrape.Session(base_url='http://google.com')

/home/mefioo/public_html/kariera_naukowa/env/lib/python2.7/site-packages/dryscrape/session.pyc in __init__(self, driver, base_url)
     16                driver = None,
     17                base_url = None):
---> 18     self.driver = driver or DefaultDriver()
     19     self.base_url = base_url
     20 

/home/mefioo/public_html/kariera_naukowa/env/lib/python2.7/site-packages/dryscrape/driver/webkit.pyc in __init__(self, **kw)
     28   def __init__(self, **kw):
     29     kw.setdefault('node_factory_class', NodeFactory)
---> 30     super(Driver, self).__init__(**kw)

/home/mefioo/public_html/kariera_naukowa/env/lib/python2.7/site-packages/webkit_server.pyc in __init__(self, connection, node_factory_class)
    228                node_factory_class = NodeFactory):
    229     super(Client, self).__init__()
--> 230     self.conn = connection or ServerConnection()
    231     self._node_factory = node_factory_class(self)
    232 

/home/mefioo/public_html/kariera_naukowa/env/lib/python2.7/site-packages/webkit_server.pyc in __init__(self, server)
    505   def __init__(self, server = None):
    506     super(ServerConnection, self).__init__()
--> 507     self._sock = (server or get_default_server()).connect()
    508     self.buf = SocketBuffer(self._sock)
    509     self.issue_command("IgnoreSslErrors")

/home/mefioo/public_html/kariera_naukowa/env/lib/python2.7/site-packages/webkit_server.pyc in connect(self)
    438     """ Returns a new socket connection to this server. """
    439     sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
--> 440     sock.connect(("127.0.0.1", self._port))
    441     return sock
    442 

/usr/local/lib/python2.7/socket.pyc in meth(name, self, *args)
    222 
    223 def meth(name,self,*args):
--> 224     return getattr(self._sock,name)(*args)
    225 
    226 for _m in _socketmethods:

error: [Errno 111] Connection refused

I do that just for example, because in my django app it's part of view logic, and requesting that view second time results in this error. Restarting django server or python shell solves it but only for first connection, so it's useless for working webpage. Am I missing some "clean" or "restart" of X session, or webkit-server (capibara-webkit) between those two?

Mateusz Knapczyk
  • 276
  • 1
  • 2
  • 15

1 Answers1

0

Ok, this is not a "real" answer, because I still don't know what is wrong, but I found a way to get this working. I have upgraded dryscrape to 1.0 and used it's new method dryscrape.start_xvfb() instead of xvfbwrapper Xvfb(). Everything's working fine.

Mateusz Knapczyk
  • 276
  • 1
  • 2
  • 15