I have a master-worker network on aws ec2 using dask distributed library. For now i have one master machine and one worker machine. Master has REST api (flask) for scheduling scrapy jobs on worker machine. I am using docker for both master and worker that means both master container and worker container communicating with each other using dask distributed.
When i scheduler scrapy job, crawling starts successfully and scrapy uploads data to s3 as well. But after some time scrapy gets stuck at one point and nothing happens after that.
Please check attached log file for more info
2019-01-02 08:05:30 [botocore.hooks] DEBUG: Event needs-retry.s3.PutObject: calling handler <bound method S3RegionRedirector.redirect_from_error of <botocore.utils.S3RegionRedirector object at 0x7f1fe54adf28>>
scrapy get stuck at above point.
command to run docker:
sudo docker run --network host -d crawler-worker # for worker
sudo docker run -p 80:80 -p 8786:8786 -p 8787:8787 --net=host -d crawler-master # for master
I am facing this issue on fresh ec2 machine as well