Morningstar xpath return empty in Google Sheet (Imported content is empty)

Question

I am trying to pull a number from the Morningstar "Cash Flow" page an arbitrary stock ticker using XPath. I have the tested the XPath on the morningstar website by an XPath tester and it returned desired values. However, when I want to use this value in a google sheet, it returns #N/A (Imported content is empty.).

=IMPORTXML("http://financials.morningstar.com/cash-flow/cf.html?t=fb&region=usa&culture=en-US", "//div[@id='data_tts1']/div")

I did a bit of research on this and find out that data in such websites generated dynamically and downloads the content in stages, Therefore, page needs to be loaded first to be able to pull any data out of it!

I'm wondering if there is any solution to this issue? You help would much be appreciated.

score 0 · Answer 1 · answered Mar 22 '20 at 16:29

0

it's empty as it should be because the content you are trying to scrape is of JavaScript origin. Google Sheets does not support imports of JS elements. you can always test this by disabling JS for a given site and only what's left can be scraped:

answered Mar 22 '20 at 16:29

player0

124,011
12
67
124

score 0 · Answer 2 · answered Mar 24 '20 at 04:40

It might be possible. But you have to prepare a custom sheet to extract the data. Use IMPORTDATA to parse the .json which contains the data :

http://financials.morningstar.com/ajax/ReportProcess4HtmlAjax.html?&t=XNAS:FB&region=usa&culture=en-US&cur=&reportType=cf&period=12&dataType=A&order=asc&columnYear=5&curYearPart=1st5year&rounding=3&view=raw&r=672024&callback=jsonp1585016592836&_=1585016593002

AFAIK, you couldn't import directly the .csv version (specific headers needed, so curl or other specific tools would be required).

http://financials.morningstar.com/ajax/ReportProcess4CSV.html?&t=XNAS:FB&region=usa&culture=en-US&cur=&reportType=cf&period=12&dataType=A&order=asc&columnYear=5&curYearPart=1st5year&rounding=3&view=raw&r=764423&denominatorView=raw&number=3

Since this .json is very special (contains html tags), i don't think a custom script for GoogleSheets could import it correctly. So once the .json is loaded in GoogleSheets, TRANSPOSE the rows to columns and use formulas to locate your data (target the cells which contain data_s1 and data_s2 for example). Use CONCAT to merge the cells of interest. Then split the result into columns (use a custom separator). SEARCH for the data you want and clean the results with SUBSTITUTE. The method is dirty but i think it could be automated for the whole process.

Thanks a lot. would you kindly provide an example so I can modify and use. — Ulf, Mar 25 '20 at 19:08
Sure. Proof of concept here : https://docs.google.com/spreadsheets/d/1ToC4S9wctRG5lm6JPmBgWYgxEWKBlcpdnPGO5mtmkvM/edit?usp=sharing credits to @max-makhrov for the regex part. Check the yellow cells for the formulas. The result starts in G33. — E.Wiest, Mar 28 '20 at 02:49

Morningstar xpath return empty in Google Sheet (Imported content is empty)

2 Answers2