2

After my spider has crawled all the urls,scrapy doesn't stoped,how to stop it after crawl finished?

The start url is http://http://192.168.139.28/dvwa.

After my spider finished,it seems the spider is always Starting new HTTP connection (1): 192.168.139.28,and I don't know how to make it stop by itself,can you help me?

Here is the output info:

 'retry/reason_count/504 Gateway Time-out': 2,
 'scheduler/dequeued': 82,
 'scheduler/dequeued/memory': 82,
 'scheduler/enqueued': 82,
 'scheduler/enqueued/memory': 82,
 'splash/execute/request_count': 40,
 'splash/execute/response_count/200': 38,
 'splash/execute/response_count/400': 1,
 'splash/execute/response_count/504': 3,
 'start_time': datetime.datetime(2018, 1, 10, 6, 36, 4, 298146)}
  2018-01-10 14:37:48 [scrapy.core.engine] INFO: Spider closed (finished)
  2018-01-10 14:38:41 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): 192.168.139.28
  2018-01-10 14:38:41 [urllib3.connectionpool] DEBUG: http://192.168.139.28:80 "GET / HTTP/1.1" 200 3041
  2018-01-10 14:39:42 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): 192.168.139.28
  2018-01-10 14:39:42 [urllib3.connectionpool] DEBUG: http://192.168.139.28:80 "GET / HTTP/1.1" 200 3041
  2018-01-10 14:40:42 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): 192.168.139.28
  2018-01-10 14:40:42 [urllib3.connectionpool] DEBUG: http://192.168.139.28:80 "GET / HTTP/1.1" 200 3041
  2018-01-10 14:41:42 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): 192.168.139.28
  2018-01-10 14:41:42 [urllib3.connectionpool] DEBUG: http://192.168.139.28:80 "GET / HTTP/1.1" 200 3041
  2018-01-10 14:42:42 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): 192.168.139.28
  2018-01-10 14:42:42 [urllib3.connectionpool] DEBUG: http://192.168.139.28:80 "GET / HTTP/1.1" 200 3041
  2018-01-10 14:43:42 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): 192.168.139.28
  2018-01-10 14:43:42 [urllib3.connectionpool] DEBUG: http://192.168.139.28:80 "GET / HTTP/1.1" 200 3041
  2018-01-10 14:44:42 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): 192.168.139.28
  2018-01-10 14:44:42 [urllib3.connectionpool] DEBUG: http://192.168.139.28:80 "GET / HTTP/1.1" 200 3041
  2018-01-10 14:45:42 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): 192.168.139.28
  2018-01-10 14:45:42 [urllib3.connectionpool] DEBUG: http://192.168.139.28:80 "GET / HTTP/1.1" 200 3041
  2018-01-10 14:46:42 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): 192.168.139.28
  2018-01-10 14:46:42 [urllib3.connectionpool] DEBUG: http://192.168.139.28:80 "GET / HTTP/1.1" 200 3041
  2018-01-10 14:47:42 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): 192.168.139.28
  2018-01-10 14:47:42 [urllib3.connectionpool] DEBUG: http://192.168.139.28:80 "GET / HTTP/1.1" 200 3041
  2018-01-10 14:48:42 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): 192.168.139.28
  2018-01-10 14:48:42 [urllib3.connectionpool] DEBUG: http://192.168.139.28:80 "GET / HTTP/1.1" 200 3041
  2018-01-10 14:49:42 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): 192.168.139.28
  2018-01-10 14:49:42 [urllib3.connectionpool] DEBUG: http://192.168.139.28:80 "GET / HTTP/1.1" 200 3041
  2018-01-10 14:50:42 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): 192.168.139.28
  2018-01-10 14:50:42 [urllib3.connectionpool] DEBUG: http://192.168.139.28:80 "GET / HTTP/1.1" 200 3041
  2018-01-10 14:51:42 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): 192.168.139.28
  2018-01-10 14:51:42 [urllib3.connectionpool] DEBUG: http://192.168.139.28:80 "GET / HTTP/1.1" 200 3041
  2018-01-10 14:52:42 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): 192.168.139.28
  2018-01-10 14:52:42 [urllib3.connectionpool] DEBUG: http://192.168.139.28:80 "GET / HTTP/1.1" 200 3041
  2018-01-10 14:53:42 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): 192.168.139.28
  2018-01-10 14:53:42 [urllib3.connectionpool] DEBUG: http://192.168.139.28:80 "GET / HTTP/1.1" 200 3041
  ...

I am using scrapy_splash with scrapy,and the scrapy_splash server got 504 error like here,then I try to start scrapy_splash server by docker run -it -p 8050:8050 scrapinghub/splash --max-timeout 3600,but it doesn't help,scrapy is still Starting new HTTP connection (1): 192.168.139.28.

My spider code is:

from scrapy import cmdline
os.chdir("./crawler")
cmdline.execute('scrapy crawl exp10it'.split())

Later when I try to use the command line:scrapy crawl exploit,the issue doesn't come out and scrapy just stops normaly after crawl finished,but I don't know why cmdline.execute('scrapy crawl exp10it'.split()) does'nt stop.

quanyechavs huo
  • 125
  • 1
  • 13

0 Answers0