-3

I am trying to get the ESG data from the Yahoo Finance website but the BeautifulSoup function does not seem to identify the entire html script?

import requests
from bs4 import BeautifulSoup
 
url = requests.get("https://uk.finance.yahoo.com/quote/XOM/sustainability?p=XOM&.tsrc=fin-srch").content
soup = BeautifulSoup(url, 'html.parser') 
print(soup)
  • 1
    The data is likely loaded via JavaScript, which neither `requests` nor `BeautifulSoup` is capable of executing. – esqew Jul 03 '22 at 01:42
  • 1
    You might find it easier to use a Python library such as [yfinance](https://pypi.org/project/yfinance/) to get the information – Martin Evans Jul 05 '22 at 11:04

1 Answers1

1

The ESG information you want is inside a <script> tag that is stored as JSON inside the HTML that is returned.

  1. First you need to first locate the correct script tag and then extract it, i.e. one containing the JSON text. find() can be used to locate the start and end of the JSON.

  2. The JSON can then be converted into a Python data structure using Python json.loads() function.

  3. All of the data can now be accessed using standard Python list/dictionary type notation. The ESG scores are buried quite deep inside the structure. I would recommend first printing out just the JSON and using an online tool to format it. There are then tools which can show you the 'path' to access any item in the JSON.

The ESG scores can then be accessed as follows:

from bs4 import BeautifulSoup
import requests
import json

headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'} 
req = requests.get("https://uk.finance.yahoo.com/quote/XOM/sustainability?p=XOM&.tsrc=fin-srch", headers=headers)
soup = BeautifulSoup(req.content, 'html.parser') 

for script in soup.find_all('script'):
    if script.string and "root.App.main" in script.string:
        f1 = script.string.find("root.App.main = ")
        f2 = script.string.find("\n", f1)
        data = json.loads(script.string[f1+16:f2-1])
        esg_scores = data['context']['dispatcher']['stores']['QuoteSummaryStore']['esgScores']
        
        for key, value in esg_scores.items():
            print(f"{key:40}  {value}")

        break

This shows the available ESG data as:

palmOil                                   False
peerSocialPerformance                     {'min': 2.04, 'avg': 10.441525423728814, 'max': 19.64}
controversialWeapons                      False
ratingMonth                               5
gambling                                  False
socialScore                               {'raw': 9.82, 'fmt': '9.8'}
nuclear                                   False
furLeather                                False
alcoholic                                 False
gmo                                       False
catholic                                  False
socialPercentile                          None
peerGovernancePerformance                 {'min': 4.73, 'avg': 8.467627118644069, 'max': 13.63}
peerCount                                 66
relatedControversy                        ['Operations Incidents']
governanceScore                           {'raw': 8.14, 'fmt': '8.1'}
environmentPercentile                     None
animalTesting                             True
peerEsgScorePerformance                   {'min': 8.24, 'avg': 37.78166666666667, 'max': 58.64}
tobacco                                   False
totalEsg                                  {'raw': 36.46, 'fmt': '36.5'}
highestControversy                        3
esgPerformance                            OUT_PERF
coal                                      False
peerHighestControversyPerformance         {'min': 0, 'avg': 2.0454545454545454, 'max': 5}
pesticides                                False
adult                                     False
ratingYear                                2022
maxAge                                    86400
percentile                                {'raw': 81.8, 'fmt': '82'}
peerGroup                                 Oil & Gas Producers
smallArms                                 False
peerEnvironmentPerformance                {'min': 0.12, 'avg': 18.42101694915254, 'max': 26.75}
environmentScore                          {'raw': 18.51, 'fmt': '18.5'}
governancePercentile                      None
militaryContract                          False

I suggest you print(data) to better understand all of the information available.

Martin Evans
  • 45,791
  • 17
  • 81
  • 97