1

Under my settings.py

SPLASH_URL = 'http://127.0.0.1:8050'
DOWNLOADER_MIDDLEWARES = {
  'scrapy_splash.SplashCookiesMiddleware': 723,
  'scrapy_splash.SplashMiddleware': 725,
  'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810,
}
SPIDER_MIDDLEWARES = {
  'scrapy_splash.SplashDeduplicateArgsMiddleware': 100,
}
DUPEFILTER_CLASS = 'scrapy_splash.SplashAwareDupeFilter'

My Spider source code

# -*- coding: utf-8 -*-
import scrapy
from scrapy.spiders import CrawlSpider
from scrapy_splash import SplashRequest

class SampleSpider(CrawlSpider):
  name = 'sample'
  allowed_domains = ['sample.com']

  def start_requests(self):
    urls = [
      'https://www.sample.com/view-all-clothing/bottoms/leggings'
    ]

    for url in urls:
      yield SplashRequest(url=url, callback=self.parse)

  def parse(self,response):
    for item in response.css("li.product-compact"):
      yield {
        'category_link': response.request.url,
        'title': item.css("a.pdp-link::text").extract()
      }

  pass

Docker container

MINGW64 /c/Program Files/Docker Toolbox
$ docker container ls
CONTAINER ID        IMAGE                COMMAND                  CREATED             STATUS              PORTS                                NAMES
75b69d937e79        scrapinghub/splash   "python3 /app/bin/sp…"   16 minutes ago      Up 16 minutes       5023/tcp, 127.0.0.1:8050->8050/tcp   vigilant_chatterjee

And still got this error

2018-07-10 15:18:35 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://127.0.0.1:8050/robots.txt> (failed 1 times): Connection was refused by other side: 10061: No connection could be made because the target machine actively refused it..
2018-07-10 15:18:36 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://127.0.0.1:8050/robots.txt> (failed 2 times): Connection was refused by other side: 10061: No connection could be made because the target machine actively refused it..
2018-07-10 15:18:37 [scrapy.downloadermiddlewares.retry] DEBUG: Gave up retrying <GET http://127.0.0.1:8050/robots.txt> (failed 3 times): Connection was refused by other side: 10061: No connection could be made because the target machine actively refused it..
2018-07-10 15:18:37 [scrapy.downloadermiddlewares.robotstxt] ERROR: Error downloading <GET http://127.0.0.1:8050/robots.txt>: Connection was refused by other side: 10061: No connection could be made because the target machine actively refused it..
ConnectionRefusedError: Connection was refused by other side: 10061: No connection could be made because the target machine actively refused it..
2018-07-10 15:18:38 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.sample.com/view-all-clothing/bottoms/leggings via http://127.0.0.1:8050/render.html> (failed 1 times): Connection was refused by other side: 10061: No connection could be made because the target machine actively refused it..
2018-07-10 15:18:39 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.sample.com/view-all-clothing/bottoms/leggings via http://127.0.0.1:8050/render.html> (failed 2 times): Connection was refused by other side: 10061: No connection could be made because the target machine actively refused it..
2018-07-10 15:18:40 [scrapy.downloadermiddlewares.retry] DEBUG: Gave up retrying <GET https://www.sample.com/view-all-clothing/bottoms/leggings via http://127.0.0.1:8050/render.html> (failed 3 times): Connection was refused by other side: 10061: No connection could be made because the target machine actively refused it..
2018-07-10 15:18:40 [scrapy.core.scraper] ERROR: Error downloading <GET https://www.sample.com/view-all-clothing/bottoms/leggings via http://127.0.0.1:8050/render.html>: Connection was refused by other side: 10061: No connection could be made because the target machine actively refused it..
2018-07-10 15:18:40 [scrapy.core.engine] INFO: Closing spider (finished)

I've done all this settings that I know are all right but can't figure it anymore where did I do wrong.

Please let me know as I am still new to python, scrapy and splash JS rendering service

ישו אוהב אותך
  • 28,609
  • 11
  • 78
  • 96
user1441797
  • 134
  • 1
  • 1
  • 10

1 Answers1

0

it should be set in settings.py:

SPLASH_URL = 'http://0.0.0.0:8050'

And docker container should be listening network card of server.

0.0.0.0:8050->8050/tcp
Thanh Nguyen Van
  • 10,292
  • 6
  • 35
  • 53