I have followed Splash's FAQ for production setups and my system currently looks like this:
- 1 Scrapy Container with 6 concurrency requests.
- 1 HAProxy Container that load balance to splash containers
- 2 Splash Containers with 3 slots each.
I use docker stats
to monitor my setup and I never get more than 7% CPU usage or more than 55% Memory usage.
I still get a lot of
DEBUG: Retrying <GET https://the/url/ via http://haproxy:8050/execute> (failed 1 times): 504 Gateway Time-out
For every successful request I get 6-7 of these timeouts.
I have experimented with changing the slots of the splash containers and the amount of concurrency requests. I've also tried running with a single splash container behind the HAProxy. I keep getting these errors.
I'm running on a AWS EC2 t2.micro instance which have 1gb memory.
I suspect that the issue is still related to the splash instance getting flooded. Is there any advice you can give me to reduce the load of the Splash instances? Is there a good ratio between slots and concurrency requests? Should I throttle requests?