Scrapy returning zero results

Question

I am attempting to learn how to use scrapy, and am trying to do what I think is a simple project. I am attempting to pull 2 pieces of data from a single webpage - crawling additional links isn't needed. However, my code seems to be returning zero results. I have tested the xpaths in Scrapy Shell, and both return the expected results.

My item.py is:

import scrapy

class StockItem(scrapy.Item):
    quote = scrapy.Field()
    time = scrapy.Field()

My spider, named stockscrapy.py, is:

import scrapy

class StockSpider(scrapy.Spider):
    name = "ugaz"
    allowed_domains = ["nasdaq.com"]
    start_urls = ["http://www.nasdaq.com/symbol/ugaz/"]

def parse(self, response):
    stock = StockItem()
    stock['quote'] = response.xpath('//*[@id="qwidget_lastsale"]/text()').extract()
    stock['time'] = response.xpath('//*[@id="qwidget_markettime"]/text()').extract()
    return stock

To run the script, I use the command line:

scrapy crawl ugaz -o stocks.csv

Any and all help is greatly appreciated.

Some websites block scraping. I believe nasdaq is one of them, but i'm not 100% sure. — reticentroot, May 04 '15 at 03:16
Try change `User-Agent` to Chrome or Firefox one following instruction here http://stackoverflow.com/questions/18920930/scrapy-python-set-up-user-agent — number5, May 04 '15 at 03:25
could you please add the proper start-url, because this start url will give you only single item to yield and for that you don't have to write a spider. — Jithin, May 04 '15 at 05:16
What is the output of the scrapy command? The code runs fine when you indent the parse block. — Frank Martin, May 04 '15 at 05:56

score 1 · Accepted Answer · answered May 04 '15 at 07:28

You need to indent the parse block.

import scrapy

class StockSpider(scrapy.Spider):
    name = "ugaz"
    allowed_domains = ["nasdaq.com"]
    start_urls = ["http://www.nasdaq.com/symbol/ugaz/"]

    # Indent this block
    def parse(self, response):
        stock = StockItem()
        stock['quote'] = response.xpath('//*[@id="qwidget_lastsale"]/text()').extract()
        stock['time'] = response.xpath('//*[@id="qwidget_markettime"]/text()').extract()
        return stock

Thank you, Frank! I'm new to Python and didn't realize the importance of proper indentation. — DrJP, May 04 '15 at 22:54

Scrapy returning zero results

1 Answers1