Using scrapy with scrapyd in Django not entering def(parse)

Question

i'm still learning scrapy and i am trying to use scrapy with scrapyd inside a Django Project.

But i am noticing that the spider just wont enter the def(parse)

import scrapy
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule

class NewsSpider(CrawlSpider):
    print("Start SPIDER")
    name = 'detik'
    allowed_domains = ['news.detik.com']
    start_urls = ['https://news.detik.com/indeks/all/?date=02/28/2018']

def parse(self, response):
    print("SEARCH LINK")
    urls = response.xpath("//article/div/a/@href").extract()        
    for url in urls:
        url = response.urljoin(url)
        yield scrapy.Request(url=url, callback=self.parse_detail)

def parse_detail(self,response):
    print("SCRAPEEE")
    x = {}
    x['breadcrumbs'] = response.xpath("//div[@class='breadcrumb']/a/text()").extract()
    x['tanggal'] = response.xpath("//div[@class='date']/text()").extract_first()
    x['penulis'] = response.xpath("//div[@class='author']/text()").extract_first()
    x['judul'] = response.xpath("//h1/text()").extract_first()
    x['berita'] = response.xpath("normalize-space(//div[@class='detail_text'])").extract_first()
    x['tag'] = response.xpath("//div[@class='detail_tag']/a/text()").extract()
    x['url'] = response.request.url
    return x

The print("Start Spider") is in the log but the print("Search Link") is not.

i also have this kind of error

  [Launcher,3804/stderr] Unhandled error in Deferred:

Please help. PS : When i run it outside the Django it work just fine

Thank you

try this tutorial maybe it will help https://medium.com/@ali_oguzhan/how-to-use-scrapy-with-django-application-c16fabd0e62e — Druta Ruslan, Jun 01 '18 at 14:18
i actually TRY the tutorial. and it's work, but when i changed it into my own spider. it's just won't work (the tutorial doesn't need parse) — Vira Xeva, Jun 01 '18 at 14:29
Do you see the request going to https://news.detik.com/indeks/all/?date=02/28/2018 in terminal? — Umair Ayub, Jun 01 '18 at 17:21
Yes, i added print after the start_urls and it's work. i think the Unhandled error in Deferred is more the problems — Vira Xeva, Jun 01 '18 at 23:56

score 0 · Accepted Answer · answered Jun 14 '18 at 13:19

0

It seems to me that you are missing the crawling rules in your spider.

Try adding

KwSpiderSpider.rules = [
    Rule(LinkExtractor(allow=".+", unique=True),callback='parse'),
]

to your code, after the start_urls.
I don't understand how it could work outside of django though.

answered Jun 14 '18 at 13:19

Nicolò Gasparini

2,228
2
24
53

Hello. it's seems that the problem is in pipeline not the spider. i already solved it. but i don't know how to close this thread. the spider work just fine – Vira Xeva Jun 18 '18 at 03:32

Using scrapy with scrapyd in Django not entering def(parse)

1 Answers1