I'm using dryscrape/webkit_server for scraping javascript enabled websites.
The memory usage of the process webkit_server seems to increase with each call to session.visit(). It happens to me using the following script:
import dryscrape
for url in urls:
session = dryscrape.Session()
session.set_timeout(10)
session.set_attribute('auto_load_images', False)
session.visit(url)
response = session.body()
I'm iterating over approx. 300 urls and after 70-80 urls webkit_server takes up about 3GB of memory. However it is not really the memory that is the problem for me, but it seems that dryscrape/webkit_server is getting slower with each iteration. After the said 70-80 iterations dryscrape is so slow that it raises a timeout error (set timeout = 10 sec) and I need to abort the python script. Restarting the webkit_server (e.g. after every 30 iterations) might help and would empty the memory, however I'm unsure if the 'memory leaks' are really responsible for dry scrape getting slower and slower.
Does anyone know how to restart the webkit_server so I could test that?
I have not found an acceptable workaround for this issue, however I also don't want to switch to another solution (selenium/phantomjs, ghost.py) as I simply love dryscrape for its simplicity. Dryscrape is working great btw. if one is not iterating over too many urls in one session.
This issue is also discussed here
https://github.com/niklasb/dryscrape/issues/41
and here