-1

I'm looking to parse a specific Yahoo stock page using a Python script (take https://finance.yahoo.com/quote/NOA?ltr=1 for example) and print the "Recommended Rating" to a file. Recommended rating can be found on the right hand side of the page about half way down.

This is what I have so far

  try:
    import urllib.request as urllib2
except ImportError:
    import urllib2
from bs4 import BeautifulSoup
quote_page = 'https://finance.yahoo.com/quote/NOA?ltr=1'
page = urllib2.urlopen(quote_page)
soup = BeautifulSoup(page, "html.parser")
name_box = soup.find(attrs={'div': 'rating-text Arrow South Fw(b) Bgc($strongBuy) Bdtc($strongBuy)'})
name = name_box.text.strip()
print(name)

The tricky part is that I believe the recommended rating is only listed on the page as InnerHTML. I'm not sure how i'd go about retrieving this data, a push in the right direction would be greatly appreciated!

t_wimms
  • 11
  • 1
  • 1
  • 4
  • 2
    What have you tried? Did you try [BeautifulSoup](https://www.crummy.com/software/BeautifulSoup/) in particular? – 9000 Feb 14 '17 at 21:42
  • I was using lxml, would you recommend BeautifulSoup? – t_wimms Feb 14 '17 at 21:43
  • Don't Yahoo have an API? It is much more reliable to use an API than to scrape, since APIs are meant to be used by automated systems, and websites generally are not. You could build a great scraper and find you get yourself blocked by Yahoo. – halfer Feb 14 '17 at 21:46
  • @t_wimms: Real HTML is messy, and BeautifulSoup handles this messiness much better, while lxml expects well-formex XML. – 9000 Feb 14 '17 at 21:47
  • @halfer From what I just read, they don't have a Financial API available. – t_wimms Feb 14 '17 at 21:54

2 Answers2

3

Yahoo makes a get request to the url in the script below for some of their data. If you look in the network tab of the developer tools and refresh the page for NOA stock you should see 'NOA?formatt...'. Click this and then view the response object to see some of the data. You'll need the requests module for the script below to work: pip install requests.

# get_mean_recs.py
import csv
from datetime import datetime
import requests
import sys

get_date = lambda : datetime.utcnow().strftime('%d-%m-%Y')

lhs_url = 'https://query2.finance.yahoo.com/v10/finance/quoteSummary/'
rhs_url = '?formatted=true&crumb=swg7qs5y9UP&lang=en-US&region=US&' \
          'modules=upgradeDowngradeHistory,recommendationTrend,' \
          'financialData,earningsHistory,earningsTrend,industryTrend&' \
          'corsDomain=finance.yahoo.com'

def get_mean_rec(ticker):
    url =  lhs_url + ticker + rhs_url
    r = requests.get(url)
    if not r.ok:
        return -1
    result = r.json()['quoteSummary']['result'][0]
    return result['financialData']['recommendationMean']['fmt']

def read_from_csv(fn):
    with open(fn, 'r') as f:
        reader = csv.reader(f)
        for line in reader:
            for ticker in line:
                yield ticker

def write_to_csv(fn, data):
    with open(fn, 'a') as f:
        fieldnames = data[0].keys()
        writer = csv.DictWriter(f, fieldnames=fieldnames)
        for item in data:
            writer.writerow(item)

def assemble_dict(ticker):
    return {
        'ticker': ticker,
        'mean_rec': get_mean_rec(ticker),
        'utc_date': get_date()
    }

def main():
    in_fn = sys.argv[1]
    out_fn = sys.argv[2]
    data = [assemble_dict(ticker) for ticker in read_from_csv(in_fn)]
    write_to_csv(out_fn, data)

if __name__ == '__main__':
    main()

Usage:

python get_mean_recs.py input.csv output.csv
Jonathan
  • 1,382
  • 1
  • 13
  • 13
  • Thanks! That is exactly what I was looking for! – t_wimms Feb 14 '17 at 23:16
  • I was able to figure out the CSV but I'm curious how you know how to format the link that gets the recommended ratings. I'm looking to add more information to the list and I'm trying to figure out how to modify these urls to get the right info. Thank! – t_wimms Feb 15 '17 at 23:16
  • You don't need to modify the url. Just pprint the object that is returned from r.json() to see what else is available. Then follow the keys/indices to the value you want. – Jonathan Feb 16 '17 at 00:22
  • I run a curl with the URL in this post, curl https://query2.finance.yahoo.com/v10/finance/quoteSummary/APPL?formatted=true&crumb=swg7qs5y9UP&lang=en-US&region=US&modules=upgradeDowngradeHistory,recommendationTrend,financialData,earningsHistory,earningsTrend,industryTrend&corsDomain=finance.yahoo.com and get an error, {"finance":{"result":null,"error":{"code":"Bad Request","description":"Missing required query parameter=modules"}}} . Is the URL changed in recent years? I know this post was posted a few years ago. – vic Feb 19 '23 at 20:12
0

There is an API for accessing the yahoo finance information, e.g.

http://finance.yahoo.com/d/quotes.csv?s=NOA&f=snd1l1yr

I think you may be better off using that to fetch the required information. Some more info on the parameters can be found here: http://wern-ancheta.com/blog/2015/04/05/getting-started-with-the-yahoo-finance-api/

niallhaslam
  • 282
  • 2
  • 12
  • I'm unable to find a list of available information that can be pulled with this API, do you have happen to have a link? Thanks! – t_wimms Feb 14 '17 at 22:04