1

I am attempting to learn how to use scrapy, and am trying to do what I think is a simple project. I am attempting to pull 2 pieces of data from a single webpage - crawling additional links isn't needed. However, my code seems to be returning zero results. I have tested the xpaths in Scrapy Shell, and both return the expected results.

My item.py is:

import scrapy

class StockItem(scrapy.Item):
    quote = scrapy.Field()
    time = scrapy.Field()

My spider, named stockscrapy.py, is:

import scrapy

class StockSpider(scrapy.Spider):
    name = "ugaz"
    allowed_domains = ["nasdaq.com"]
    start_urls = ["http://www.nasdaq.com/symbol/ugaz/"]

def parse(self, response):
    stock = StockItem()
    stock['quote'] = response.xpath('//*[@id="qwidget_lastsale"]/text()').extract()
    stock['time'] = response.xpath('//*[@id="qwidget_markettime"]/text()').extract()
    return stock

To run the script, I use the command line:

scrapy crawl ugaz -o stocks.csv

Any and all help is greatly appreciated.

DrJP
  • 13
  • 2
  • Some websites block scraping. I believe nasdaq is one of them, but i'm not 100% sure. – reticentroot May 04 '15 at 03:16
  • Try change `User-Agent` to Chrome or Firefox one following instruction here http://stackoverflow.com/questions/18920930/scrapy-python-set-up-user-agent – number5 May 04 '15 at 03:25
  • could you please add the proper start-url, because this start url will give you only single item to yield and for that you don't have to write a spider. – Jithin May 04 '15 at 05:16
  • What is the output of the scrapy command? The code runs fine when you indent the parse block. – Frank Martin May 04 '15 at 05:56

1 Answers1

1

You need to indent the parse block.

import scrapy

class StockSpider(scrapy.Spider):
    name = "ugaz"
    allowed_domains = ["nasdaq.com"]
    start_urls = ["http://www.nasdaq.com/symbol/ugaz/"]

    # Indent this block
    def parse(self, response):
        stock = StockItem()
        stock['quote'] = response.xpath('//*[@id="qwidget_lastsale"]/text()').extract()
        stock['time'] = response.xpath('//*[@id="qwidget_markettime"]/text()').extract()
        return stock
Frank Martin
  • 2,584
  • 2
  • 22
  • 25
  • Thank you, Frank! I'm new to Python and didn't realize the importance of proper indentation. – DrJP May 04 '15 at 22:54