Web scraping JS content with Python (Yahoo Finance)

Question

I am currently struggling with this page of Yahoo Finance : https://sg.finance.yahoo.com/quote/1B0.SI/history?period1=1426780800&period2=1489939200&interval=div%7Csplit&filter=split&frequency=1mo

I would need to get the date and ratio of the stock split, but I dove into a json file in which I do not see any of these information!

I'm using the script mentionned here How to understand this raw HTML of Yahoo! Finance when retrieving data using Python?

from bs4 import BeautifulSoup
from pprint import pprint as pp
import re
import json
import requests  

url='https://sg.finance.yahoo.com/quote/1B0.SI/history?period1=1426780800&period2=1489939200&interval=div%7Csplit&filter=split&frequency=1mo'
soup = BeautifulSoup(requests.get(url).content)
script = soup.find("script",text=re.compile("root.App.main")).text
data = json.loads(re.search("root.App.main\s+=\s+(\{.*\})", script).group(1))
stores = data["context"]["dispatcher"]["stores"]
pp(stores)

Please let me know if your have the idea where I can find it.

Thanks!

You have a module for Yahoo Finance if you want, https://pypi.python.org/pypi/yahoo-finance — Anthony, Mar 27 '17 at 08:03
@AnthonyPradal Thank you, to be honest I did not know the API, but I don't see any method to get Stock Splits.. I would also like to solve this problem, it should be useful to me in other projects. For example, the solution cited here http://stackoverflow.com/questions/21445966/scraping-javascript-with-python-and-selenium-webdriver is really powerful in many cases, but in mine there is no such iFrame :( — Spikist, Mar 27 '17 at 10:23
Do you want the information displayed in the table with the dates Oct 11, 2016 and Feb 25, 2016? — Bill Bell, Mar 27 '17 at 15:49

score 0 · Accepted Answer · answered Mar 27 '17 at 16:10

My surmise is that you can do this with selenium, thus.

>>> from selenium import webdriver
>>> driver = webdriver.Chrome()
>>> driver.get('https://sg.finance.yahoo.com/quote/1B0.SI/history?period1=1426780800&period2=1489939200&interval=div%7Csplit&filter=split&frequency=1mo')
>>> driver.get('https://sg.finance.yahoo.com/quote/1B0.SI/history?period1=1426780800&period2=1489939200&interval=div%7Csplit&filter=split&frequency=1mo')
>>> tableRows = driver.find_elements_by_xpath('//tr')
>>> len(tableRows)
5
>>> tableRows[1].text
'Date Open High Low Close Adj close* Volume'
>>> tableRows[2].text
'Oct 11, 2016 2/1 Stock split'
>>> tableRows[3].text
'Feb 25, 2016 2/1 Stock split'

Notice especially that I had to load the page twice. The first load failed. You can learn how to deal with this contingency in the selenium documentation. (Use a try-except rather than an asssert.) The main difficulty one faces in scraping this page is that one cannot see the HTML. I made the assumption that the desired content would be in a table and that assumption proved correct.

Thank you, that's exactly what I was looking for. In conclusion, there is no tips to go over the blinded search of HTML tags? — Spikist, Mar 28 '17 at 01:45
You might be able to do that with a headless browser in this case but that proved unnecessary. — Bill Bell, Mar 28 '17 at 05:11
I've just learned of a way. Please see http://stackoverflow.com/questions/43183736/beautifulsoup-does-not-returns-all-data/43191283#43191283. — Bill Bell, Apr 03 '17 at 18:00

Web scraping JS content with Python (Yahoo Finance)

1 Answers1