1

I am trying to get data from a website, everything seems to be correct and the xpath was tested on the shell.

# -*- coding: utf-8 -*-

from scrapy.contrib.spiders import CrawlSpider


class KabumspiderSpider(CrawlSpider):
    name = "kabumspider"
    allowed_domain = ["www.kabum.com.br"]
    start_urls = ["https://www.kabum.com.br"]


def parse(self, response):
        categorias = response.xpath('//p[@class = "bot-categoria"]/a/text()').extract()
        links = response.xpath('//p[@class = "bot-categoria"]/a/@href').extract()

        for categoria in zip(categorias, links):

            info = {
                'categoria': categoria[0],
                'link': categoria[1],
            }
            yield info

Although, the output seems to be:

[

What is wrong with my code?

  • have you tried testing out the outputs in the scrapy shell? Also you should probably create items first, write the outputs to the item properties and write the items to the JSON file. – cyril Sep 08 '17 at 00:48
  • I did use items but I thought that that might have been the problem so I did it again that time using dictionaries... Everything seems to work just fine in the scrapy shell – Marcus Vinícius Sep 08 '17 at 00:54
  • if you put `print`s inside the for, can you see them? Also do you have any custom pipeline enabled? – eLRuLL Sep 08 '17 at 02:26

1 Answers1

0

I ran the scraper and it runs fine for me. The only issue i found is your parse method is outside the class.

# -*- coding: utf-8 -*-

from scrapy.contrib.spiders import CrawlSpider


class KabumspiderSpider(CrawlSpider):
    name = "kabumspider"
    allowed_domain = ["www.kabum.com.br"]
    start_urls = ["https://www.kabum.com.br"]

    def parse(self, response):
        categorias = response.xpath('//p[@class = "bot-categoria"]/a/text()').extract()
        links = response.xpath('//p[@class = "bot-categoria"]/a/@href').extract()

        for categoria in zip(categorias, links):
            info = {
                'categoria': categoria[0],
                'link': categoria[1],
            }
            yield info
Tarun Lalwani
  • 142,312
  • 9
  • 204
  • 265