-3

I am trying to webscrape some basic fundamental data in Yahoo. I used to be able to do it with BeautifulSoup. Somehow it doesn't work. Is it a time to use Selenium?

url = https://finance.yahoo.com/quote/2638.HK/key-statistics?p=2638.HK

How do I grab the number "Trailing P/E" or "Return on Equity (ttm)" or "Whatever Field"?

Nimantha
  • 6,405
  • 6
  • 28
  • 69
MK Mou
  • 29
  • 4

1 Answers1

1

You could still use BeautifulSoup for that:

from bs4 import BeautifulSoup
from requests import get

headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:89.0) Gecko/20100101 Firefox/89.0"
}

url = 'https://finance.yahoo.com/quote/2638.HK/key-statistics?p=2638.HK'
response = get(url, url, headers=headers)

soup = BeautifulSoup(response.text, 'html.parser')


data = soup.find_all("table")

def grab_by_argument(table_array, parameter):
    for table in table_array:
        trs = table.find_all('tr')
        for tr in trs:
            tds = tr.find_all('td')
            if parameter.lower() in tds[0].get_text().lower():
                return(tds[1].get_text())


print(grab_by_argument(data, "Trailing P/E"))

If you have multiple parameters for scraping just add them in list and iterate over it:

args = ["Trailing P/E", "Return on Equity (ttm)"]

for arg in args:
    print(grab_by_argument(data, arg))

If you need to grab all stock data:

def grab_all(stock_data):
    for table in stock_data:
        trs = table.find_all('tr')
        for tr in trs:
            tds = tr.find_all('td')
            print("Measure: {}".format(tds[0].get_text()))
            print("Value: {}".format(tds[1].get_text()))
            print("")

and then latter call it with:

grab_all(data)

You also could return all information as table without printing it.

Nimantha
  • 6,405
  • 6
  • 28
  • 69