0

So I have a simple crawler that crawls 3 store location pages and parses the locations of the stores to json. I print(app_data['stores']) and it prints all three pages of stores. However, when I try to write it out I only get one of the three pages, at random, written to my json file. I'd like everything that streams to be written to the file. Any help would be great. Here's the code:

import scrapy
import json
import js2xml

from pprint import pprint

class StlocSpider(scrapy.Spider):
    name = "stloc"
    allowed_domains = ["bestbuy.com"]
    start_urls = (
        'http://www.bestbuy.com/site/store-locator/11356',
        'http://www.bestbuy.com/site/store-locator/46617',
        'http://www.bestbuy.com/site/store-locator/77521'
    )

    def parse(self, response):
        js = response.xpath('//script[contains(.,"window.appData")]/text()').extract_first()
        jstree = js2xml.parse(js)
        # print(js2xml.pretty_print(jstree))

        app_data_node = jstree.xpath('//assign[left//identifier[@name="appData"]]/right/*')[0]
        app_data = js2xml.make_dict(app_data_node)
        print(app_data['stores'])

        for store in app_data['stores']:
            yield store

        with open('stores.json', 'w') as f:
            json.dump(app_data['stores'], f, indent=4)
rjdel
  • 7
  • 2
  • 2
    You're overwriting the file every time you call `parse()`. You need to collect all the results into a list and write the entire list to the file. – Barmar Oct 13 '16 at 19:33

1 Answers1

0

You are opening the file for writing every time, but you want to append. Try changing the last part to this:

with open('stores.json', 'a') as f:
    json.dump(app_data['stores'], f, indent=4)

Where 'a' opens the file for appending.

pault
  • 41,343
  • 15
  • 107
  • 149