I am parsing this page with beautiful soup:
https://au.finance.yahoo.com/q/is?s=AAPL
I am attempting to get the total revenue for 27/09/2014 (42,123,000) which is one of the first values on the statement near the top.
I inspected the element in chrome tools and found that the value is in a table with class name yfnc_tabledata1
.
My python code is as follows:
import requests
import bs4
#get webpage
page = requests.get("https://au.finance.yahoo.com/q/is?s=AAPL")
#put into beautiful soup
soup = bs4.BeautifulSoup(page.content)
#select tag
tag = soup.select("table.yfnc_tabledata1")
So far so good, this grabs the table that has the needed data but this is where I am stuck.
The chain that leads to the data I want is as follows:
tag > tbody > tr > td > table > tbody > (then the second tr)
But when I try to use this I get an empty element.
Can anybody help me with this?
Also for bonus points can anyone tell me how I can learn to extract data like this in a more general sense? I constantly need to extract data buried deep within an HTML document and can never seem to work out the correct code to get to the data I want.
Thanks a lot any help appreciated.