Writing Scrapy Python Output to JSON file

Question

I'm new to Python and web scraping. In this program I want to write final output (product name and price from all 3 links) to JSON file. Please help!

    import scrapy
    from time import sleep
    import csv, os, json
    import random


    class spider1(scrapy.Spider):
        name = "spider1"

        def start_requests(self):
            list = [
                "https://www. example.com/item1",
                "https://www. example.com/item2",
                "https://www. example.com/item3"]

            for i in list:
                yield scrapy.Request(i, callback=self.parse)
                sleep(random.randint(0, 5))

        def parse(self, response):
            product_name = response.css('#pd-h1-cartridge::text')[0].extract()
            product_price = response.css(
                '.product-price .is-current, .product-price_total .is-current, .product-price_total ins, .product-price ins').css(
                '::text')[3].extract()

            name = str(product_name).strip()
            price = str(product_price).replace('\n', "")

data = {name, price}

yield data

extracted_data = []
    while i < len(data):

        extracted_data.append()
        sleep(5)
    f = open('data.json', 'w')
    json.dump(extracted_data, f, indent=4)

score 8 · Accepted Answer · edited Jun 20 '20 at 09:12

There is actually a scrapy command to do this (Read):

scrapy crawl <spidername> -o <outputname>.<format>
scrapy crawl quotes -o quotes.json

But since you asked for the python code, I came up with this:

    def parse(self, response):
        with open("data_file.json", "w") as filee:
            filee.write('[')
            for index, quote in enumerate(response.css('div.quote')):
                json.dump({
                    'text': quote.css('span.text::text').extract_first(),
                    'author': quote.css('.author::text').get(),
                    'tags': quote.css('.tag::text').getall()
                }, filee) 
                if index < len(response.css('div.quote')) - 1:
                    filee.write(',')
            filee.write(']')

Which simply does the same thing as the scrapy output command for json files.

Thank you. It worked ;) Reference link was very useful. – amal May 27 '19 at 21:13 — amal, May 27 '19 at 21:13

score 7 · Answer 2 · answered May 26 '19 at 18:01

You don't need to create a file scrapy can do it, Firstly create a ItemLoader and Item when you return the item in the last parse, if you need this data in a json format, you can add a parameter -o when crawl the spider

for example:

scrapy crawl <spidername> -o <filename>.json

score 0 · Answer 3 · answered May 26 '19 at 17:27

0

You're not closing the data.json file thus it stays in buffered state and does not get written.

Either add a close() method:

f = open('data.json', 'w')
json.dump(extracted_data, f, indent=4)
f.close()

or use a with statement that automatically closes the file for you:

with open('data.json', 'w') as f:
    json.dump(extracted_data, f, indent=4)

Make sure you really want to overwrite the file each time using the 'w' flag. If not, use the 'a' append flag instead.

answered May 26 '19 at 17:27

Basile

121
6

Thanx. But JSON output shows only the name and price from the last link. I want to add name and price from all 3 links to extracted_data and then dump it to json file. – amal May 26 '19 at 17:52
The 'a' append flag is your friend then. – Basile May 26 '19 at 18:00

Writing Scrapy Python Output to JSON file

3 Answers3