After my spider has crawled all the urls,scrapy doesn't stoped,how to stop it after crawl finished?
The start url is http://http://192.168.139.28/dvwa
.
After my spider finished,it seems the spider is always Starting new HTTP connection (1): 192.168.139.28
,and I don't know how to make it stop by itself,can you help me?
Here is the output info:
'retry/reason_count/504 Gateway Time-out': 2,
'scheduler/dequeued': 82,
'scheduler/dequeued/memory': 82,
'scheduler/enqueued': 82,
'scheduler/enqueued/memory': 82,
'splash/execute/request_count': 40,
'splash/execute/response_count/200': 38,
'splash/execute/response_count/400': 1,
'splash/execute/response_count/504': 3,
'start_time': datetime.datetime(2018, 1, 10, 6, 36, 4, 298146)}
2018-01-10 14:37:48 [scrapy.core.engine] INFO: Spider closed (finished)
2018-01-10 14:38:41 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): 192.168.139.28
2018-01-10 14:38:41 [urllib3.connectionpool] DEBUG: http://192.168.139.28:80 "GET / HTTP/1.1" 200 3041
2018-01-10 14:39:42 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): 192.168.139.28
2018-01-10 14:39:42 [urllib3.connectionpool] DEBUG: http://192.168.139.28:80 "GET / HTTP/1.1" 200 3041
2018-01-10 14:40:42 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): 192.168.139.28
2018-01-10 14:40:42 [urllib3.connectionpool] DEBUG: http://192.168.139.28:80 "GET / HTTP/1.1" 200 3041
2018-01-10 14:41:42 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): 192.168.139.28
2018-01-10 14:41:42 [urllib3.connectionpool] DEBUG: http://192.168.139.28:80 "GET / HTTP/1.1" 200 3041
2018-01-10 14:42:42 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): 192.168.139.28
2018-01-10 14:42:42 [urllib3.connectionpool] DEBUG: http://192.168.139.28:80 "GET / HTTP/1.1" 200 3041
2018-01-10 14:43:42 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): 192.168.139.28
2018-01-10 14:43:42 [urllib3.connectionpool] DEBUG: http://192.168.139.28:80 "GET / HTTP/1.1" 200 3041
2018-01-10 14:44:42 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): 192.168.139.28
2018-01-10 14:44:42 [urllib3.connectionpool] DEBUG: http://192.168.139.28:80 "GET / HTTP/1.1" 200 3041
2018-01-10 14:45:42 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): 192.168.139.28
2018-01-10 14:45:42 [urllib3.connectionpool] DEBUG: http://192.168.139.28:80 "GET / HTTP/1.1" 200 3041
2018-01-10 14:46:42 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): 192.168.139.28
2018-01-10 14:46:42 [urllib3.connectionpool] DEBUG: http://192.168.139.28:80 "GET / HTTP/1.1" 200 3041
2018-01-10 14:47:42 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): 192.168.139.28
2018-01-10 14:47:42 [urllib3.connectionpool] DEBUG: http://192.168.139.28:80 "GET / HTTP/1.1" 200 3041
2018-01-10 14:48:42 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): 192.168.139.28
2018-01-10 14:48:42 [urllib3.connectionpool] DEBUG: http://192.168.139.28:80 "GET / HTTP/1.1" 200 3041
2018-01-10 14:49:42 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): 192.168.139.28
2018-01-10 14:49:42 [urllib3.connectionpool] DEBUG: http://192.168.139.28:80 "GET / HTTP/1.1" 200 3041
2018-01-10 14:50:42 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): 192.168.139.28
2018-01-10 14:50:42 [urllib3.connectionpool] DEBUG: http://192.168.139.28:80 "GET / HTTP/1.1" 200 3041
2018-01-10 14:51:42 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): 192.168.139.28
2018-01-10 14:51:42 [urllib3.connectionpool] DEBUG: http://192.168.139.28:80 "GET / HTTP/1.1" 200 3041
2018-01-10 14:52:42 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): 192.168.139.28
2018-01-10 14:52:42 [urllib3.connectionpool] DEBUG: http://192.168.139.28:80 "GET / HTTP/1.1" 200 3041
2018-01-10 14:53:42 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): 192.168.139.28
2018-01-10 14:53:42 [urllib3.connectionpool] DEBUG: http://192.168.139.28:80 "GET / HTTP/1.1" 200 3041
...
I am using scrapy_splash with scrapy,and the scrapy_splash server got 504 error like here,then I try to start scrapy_splash server by docker run -it -p 8050:8050 scrapinghub/splash --max-timeout 3600
,but it doesn't help,scrapy is still Starting new HTTP connection (1): 192.168.139.28
.
My spider code is:
from scrapy import cmdline
os.chdir("./crawler")
cmdline.execute('scrapy crawl exp10it'.split())
Later when I try to use the command line:scrapy crawl exploit
,the issue doesn't come out and scrapy just stops normaly after crawl finished,but I don't know why cmdline.execute('scrapy crawl exp10it'.split())
does'nt stop.