1

I have scrapy and scrapy-splash set up on a AWS Ubuntu server. It works fine for a while, but after a few hours I'll start getting error messages like this;

Traceback (most recent call last):
  File "/home/ubuntu/.local/lib/python3.5/site-
packages/twisted/internet/defer.py", line 1384, in _inlineCallbacks
result = result.throwExceptionIntoGenerator(g)
  File "/home/ubuntu/.local/lib/python3.5/site-
packages/twisted/python/failure.py", line 393, in throwExceptionIntoGe
nerator
     return g.throw(self.type, self.value, self.tb)
   File "/home/ubuntu/.local/lib/python3.5/site-
 packages/scrapy/core/downloader/middleware.py", line 43, in process_re
quest
defer.returnValue((yield download_func(request=request,spider=spider)))
twisted.internet.error.ConnectionRefusedError: Connection was refused by 
other side: 111: Connection refused.

I'll find that the splash process in docker has either terminated, or is unresponsive.

I've been running the splash process with;

sudo docker run -p 8050:8050 scrapinghub/splash

as per the scrapy-splash instructions.

I tried starting the process in a tmux shell to make sure the ssh connection is not interfering with the splah process, but no luck.

Thoughts?

Ike
  • 1,039
  • 11
  • 10

1 Answers1

1

You should run the container with --restart and -d options. See the documentation how to run Splash in production.

Tomáš Linhart
  • 9,832
  • 1
  • 27
  • 39
  • 1
    Thanks. I just stumbled upon this idea myself, googling around, but that is a great resource. I will just add for the next person... you can see why your docker processes failed with 'docker ps -a', and note that exit code 137 seems to be related to memory overuse. And what @Tomas was suggesting is to have docker automaticalky restart the process when it fails due to lack of memory. – Ike Aug 02 '17 at 06:26
  • Splash is not sending correct url. Please have a look at https://stackoverflow.com/questions/63212796/why-is-scrapy-splash-not-sending-correct-url this. I need help. – Saurav Pathak Aug 02 '20 at 05:53