Scrapy multiple pages in same structure

Question

I have the following code

import scrapy
import re


class NamePriceSpider(scrapy.Spider):
    name = 'namePrice'
    start_urls = [
        'https://www.cotodigital3.com.ar/sitios/cdigi/browse/'
    ]

    def parse(self, response):
        all_category_products = response.xpath('//*[@id="products"]')
        for product in all_category_products:
            name = product.xpath('//div[@class="descrip_full"]/text()').extract()
            price = product.xpath('//span[@class="atg_store_productPrice" and not(@style)]/span[@class '
                                  '="atg_store_newPrice"]/text() | //span[@class="price_discount"]/text()').re(
                r'\$\d{'
                r'1,'
                r'5}(?:['
                r'.,'
                r']\d{'
                r'3})*('
                r'?:[., '
                r']\d{2})*')

            yield {'name': name,
                   'price': price}

            next_page = response.xpath('//a[@title = "Siguiente"]/@href').extract_first()
            next_page = response.urljoin(next_page)

            if next_page:
                yield scrapy.Request(url=next_page, callback=self.parse)

which works perfectly fine, it scrapes names and prices of products in multiple pages of a supermarket website. The problem that i am having is that when i output all the information into a json file, there are different structures like {"name": ["a", "b", "c"], "price": ["10", "20, "30"]} for one page and {"name": ["d", "f", "g"], "price": ["40", "50, "60"]} for other page. I want one structure for all the pages, so that is easier to iterate over like so: {"name": ["a", "b", "c", "d", "f", "g"], "price": ["10", "20, "30", "40", "50, "60"]}. Is there a way to achieve this?

Oh, this is a case of `//` instead of `.//`, typical XPath typo, don’t feel bad about it; `//` looks for stuff in the whole HTML document , even if called from `product.xpath()`. Also, consider using `get()` instead of `extract()` to get a single name, and `re_first()` instead of `re()` to get a single price. — Gallaecio, Nov 06 '20 at 16:15

Scrapy multiple pages in same structure

0 Answers0