0

I am trying to scrape stock prices from google finance using scrapy. The code is not showing any errors but the output file is coming out to be blank.

Pasting the code below:

import scrapy

bse_list=['quote/ABB:NSE','quote/AEGISLOG:NSE','quote/AMARAJABAT:NSE','quote/AMBALALSA:NSE','quote/HDFC:NSE','quote/ANDHRAPET:NSE','quote/ANSALAPI:NSE']

class CrawlSpider(scrapy.Spider):
name = 'crawl'
allowed_domains = ['www.google.com/finance/']
start_urls = ['https://google.com/finance/']

def parse(self, response):
    for stock in bse_list:
        url_new = response.urljoin(stock)
        yield scrapy.Request(url_new, callback = self.parse_book)

def parse_book(self, response):
    stock_name = response.xpath('//*[@class="zzDege"]/text()').extract_first()
    current_price = response.xpath('//*[@class="YMlKec fxKbKc"]/text()').extract_first()
    stock_info = response.xpath('//*[@class="P6K39c"]/text()').extract()

    last_closing_price = stock_info[0]
    day_range = stock_info[1]
    year_range = stock_info[2]
    market_cap = stock_info[3]
    p_e_ratio = stock_inf[4]
    
    yield {
    "stock_name": stock_name,
    "current_price": current_price,
    "last_closing_price": last_closing_price,
    "day_range": day_range,
    "year_range": year_range,
    "market_cap": market_cap,
    "p_e_ratio": p_e_ratio
    }
Anikan
  • 13
  • 2

1 Answers1

0

The problem is in the stock info selection and rest of your code is working fine.

import scrapy

bse_list = ['quote/ABB:NSE', 'quote/AEGISLOG:NSE', 'quote/AMARAJABAT:NSE',
            'quote/AMBALALSA:NSE', 'quote/HDFC:NSE', 'quote/ANDHRAPET:NSE', 'quote/ANSALAPI:NSE']


class CrlSpider(scrapy.Spider):
    name = 'crl'
   
    start_urls = ['https://google.com/finance/']


    def parse(self, response):
        for stock in bse_list:
            url_new = response.urljoin(stock)
            yield scrapy.Request(url_new, callback=self.parse_book)


    def parse_book(self, response):
        stock_name = response.xpath('//*[@class="zzDege"]/text()').extract_first()
        current_price = response.xpath('//*[@class="YMlKec fxKbKc"]/text()').extract_first()
        #stock_info = response.xpath('//*[@class="P6K39c"]/text()').extract()

        #last_closing_price = stock_info[0]
        # day_range = stock_info[1]
        # year_range = stock_info[2]
        # market_cap = stock_info[3]
        # p_e_ratio = stock_inf[4]

        yield {
            "stock_name": stock_name,
            "current_price": current_price,
            #"last_closing_price": last_closing_price,
            # "day_range": day_range,
            # "year_range": year_range,
            # "market_cap": market_cap,
            # "p_e_ratio": p_e_ratio
        }

Output:

{'stock_name': 'Ansal Properties and Infrastructure Ltd', 'current_price': '₹13.30'}       
2021-11-15 20:18:08 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.google.com/finance/quote/ANDHRAPET:NSE> (referer: https://www.google.com/finance/)
2021-11-15 20:18:08 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.google.com/finance/quote/AMBALALSA:NSE> (referer: https://www.google.com/finance/)
2021-11-15 20:18:08 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.google.com/finance/quote/AEGISLOG:NSE> (referer: https://www.google.com/finance/)
2021-11-15 20:18:08 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.google.com/finance/quote/ABB:NSE> (referer: https://www.google.com/finance/)
2021-11-15 20:18:09 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.google.com/finance/quote/HDFC:NSE> (referer: https://www.google.com/finance/)
2021-11-15 20:18:09 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.google.com/finance/quote/AMARAJABAT:NSE> (referer: https://www.google.com/finance/)
2021-11-15 20:18:09 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.google.com/finance/quote/ANDHRAPET:NSE>
{'stock_name': None, 'current_price': None}
2021-11-15 20:18:09 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.google.com/finance/quote/AMBALALSA:NSE>
{'stock_name': None, 'current_price': None}
2021-11-15 20:18:09 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.google.com/finance/quote/AEGISLOG:NSE>
{'stock_name': None, 'current_price': None}
2021-11-15 20:18:09 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.google.com/finance/quote/ABB:NSE>
{'stock_name': 'ABB India Ltd', 'current_price': '₹2,139.00'}
2021-11-15 20:18:09 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.google.com/finance/quote/HDFC:NSE>
{'stock_name': 'Housing Development Finance Corp Ltd', 'current_price': '₹2,994.15'}       
2021-11-15 20:18:09 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.google.com/finance/quote/AMARAJABAT:NSE>
{'stock_name': 'Amara Raja Batteries Ltd', 'current_price': '₹685.40'}
Md. Fazlul Hoque
  • 15,806
  • 5
  • 12
  • 32
  • 1
    Thank you so much for this, it made me realise that the iterable part of the code was incorrect. Also it made me realise that having the allowed domains field was preventing this from working. Would you know why was that would be the case, do you think the site was redirecting my crawler to a different domain to crawl and that was the reason. Sorry about the ignorance but I am a noob so thought it would be best to ask – Anikan Nov 15 '21 at 16:22
  • @Anikan, Thanks, Actually, It's hard to analyze without the followed url – Md. Fazlul Hoque Nov 15 '21 at 16:33
  • 1
    Sure no issues. Also got the entire thing to work after a little tinkering, the whole issue with stock info was happening because some of the links were not valid – Anikan Nov 17 '21 at 13:07