Hello dear internauts!
I want to extract the country distribution from: https://www.etf.com/URTH - "URTH Top 10 Countries" halfway down the page. I intend to expand to other ETFs thereafter: So far, the only things I can imagine are ugly:
- Searching for the label name. Ugly because it would require searching for every country in the world, just to find the ten that are actually on there.
=IF(NOT(ISBLANK($A$1)),(importxml(concatenate("https://www.etf.com/",$A$1), "(//label[normalize-space()='United States']//following::span[@id=''][1])")),"")
- Extracting the entire HTML and then beginning some series of string operations. Ugly because seems inherently instable.
=IF(NOT(ISBLANK($A$1)),index(IMPORTXML(concatenate("https://www.etf.com/",$A$1,"#overview"), "html"),1,2))
Other things I've tried: Copying the xpath / full xpath (with and without javascript disabled). As well as trying to create my own xpath (been a few hours :D)
One added hurdled I see is that, with javascript off, there aren't many IDs, and there are other lists nearby with the same @class structure.
Desired result: United States 68.34% Japan 6.63% United Kingdom 4.29% etc. Each country in a single or two cells (country name / percentage).
Some insight in how you proceeded to build the functioning query would be really welcome. If there is a completely different approach, I'm all ears!
Note: A1=URTH