0

My importhtml and importxml functions in Google Sheets suddenly stopped working. I'm guessing it has something to do with the URL as it has been working for about a month (since I set it up) and has only recently stopped (last week). Now half the sheets is displaying #N/A unfortunately. Just wondering if someone more advanced in this area could share their insight into the problem.

IMPORTXML example:

URL: https://au.finance.yahoo.com/quote/AZJ.AX/profile?p=AZJ.AX

xPath: //*[@id='Col1-0-Profile-Proxy']/section/div[1]/div/div/p[2]/span[4]

Function returning "Resource at URL not found".

Here is a link to an example file: https://docs.google.com/spreadsheets/d/1ml3cCHIEk4vTCn3GyymXGxyCpVBgMprMzGP7OYhKPC0/edit?usp=sharing

Rubén
  • 34,714
  • 9
  • 70
  • 166
Maikelele
  • 15
  • 1
  • 4
  • Unfortunately, from your question, I cannot understand about the result values you want. I apologize for this. Can I ask you about the result values you want? – Tanaike Mar 08 '21 at 00:15
  • Yes, what is the "problem" exactly? You say that the URL stopped working, so that is the answer to the question of "why is it not working". importhtml and importxml can be temperamental though. However, if you are looking for software recommendations, maybe the best place for your question is https://superuser.com/ or https://webapps.stackexchange.com/ ? – iansedano Mar 08 '21 at 10:49
  • Thank you for replying. I apologize for the inconvenience and my poor English skill. Unfortunately, from your replying, I cannot still understand about the values you want. I apologize for this. Can you provide your result output values you expect? By this, I would like to confirm it. If you can cooperate to resolve your issue, I'm glad. Can you cooperate to do it? – Tanaike Mar 09 '21 at 00:26

3 Answers3

4

EDIT:

This looks like it may in fact be a issue on Google's side:

https://issuetracker.google.com/175144626

There has not been any resolution, but go and star that issue if it affects you or you want updates.


Maybe Yahoo has changed its site structure

I have not used the Yahoo site a lot, but maybe it has changed its format in a way that makes it unreadable for IMPORTXML. I also tested it with IMPORTHTML, with the same result. The function itself is still working with other sites though.

Testing the site you gave, disabling JavaScript to ensure the data is not dynamically generated (which makes it unreadable for IMPORTHTML), there is in fact a table:

<table class="W(100%)" data-reactid="36">
   <thead data-reactid="37">
      <tr class="C($tertiaryColor) Fz(xs) BdB Bdc($seperatorColor)" data-reactid="38">
         <th class="Ta(start) Py(6px) Fw(n)" data-reactid="39"><span data-reactid="40">Name</span></th>
         <th class="Ta(start) Py(6px) Fw(n)" data-reactid="41"><span data-reactid="42">Title</span></th>
         <th class="Ta(end) Py(6px) Fw(n)" data-reactid="43"><span data-reactid="44">Pay</span></th>
         <th class="Ta(end) Py(6px) Fw(n)" data-reactid="45"><span data-reactid="46">Exercised</span></th>
         <th class="Ta(end) Py(6px) Fw(n)" data-reactid="47"><span data-reactid="48">Year born</span></th>
      </tr>
   </thead>
   <tbody data-reactid="49">
      <tr class="C($primaryColor) BdB Bdc($seperatorColor) H(36px)" data-reactid="50">
         <td class="Ta(start)" data-reactid="51">
            <span class="" data-reactid="52">
               <!-- react-text: 53 -->Mr. Andrew T. Harding<!-- /react-text -->
            </span>
         </td>
         <td class="Ta(start) W(45%)" data-reactid="54">
            <span class="" data-reactid="55">
               <!-- react-text: 56 -->MD, CEO &amp; Director<!-- /react-text -->
            </span>
         </td>
         <td class="Ta(end)" data-reactid="57">
            <span class="" data-reactid="58">
               <!-- react-text: 59 -->2.78M<!-- /react-text -->
            </span>
         </td>
         <td class="Ta(end)" data-reactid="60"><span class="" data-reactid="61"><span data-reactid="62">N/A</span></span></td>
         <td class="Ta(end)" data-reactid="63">
            <span class="" data-reactid="64">
               <!-- react-text: 65 -->1967<!-- /react-text -->
            </span>
         </td>
      </tr>
      <tr class="C($primaryColor) BdB Bdc($seperatorColor) H(36px)" data-reactid="66">
         <td class="Ta(start)" data-reactid="67">
            <span class="" data-reactid="68">
               <!-- react-text: 69 -->Mr. George  Lippiatt<!-- /react-text -->
            </span>
         </td>
         <td class="Ta(start) W(45%)" data-reactid="70">
            <span class="" data-reactid="71">
               <!-- react-text: 72 -->CFO &amp; Group Exec. Strategy<!-- /react-text -->
            </span>
         </td>
         <td class="Ta(end)" data-reactid="73">
            <span class="" data-reactid="74">
               <!-- react-text: 75 -->481.25k<!-- /react-text -->
            </span>
         </td>
         <td class="Ta(end)" data-reactid="76"><span class="" data-reactid="77"><span data-reactid="78">N/A</span></span></td>
         <td class="Ta(end)" data-reactid="79"><span class="" data-reactid="80"><span data-reactid="81">N/A</span></span></td>
      </tr>
      <tr class="C($primaryColor) BdB Bdc($seperatorColor) H(36px)" data-reactid="82">
         <td class="Ta(start)" data-reactid="83">
            <span class="" data-reactid="84">
               <!-- react-text: 85 -->Mr. Ed  McKeiver<!-- /react-text -->
            </span>
         </td>
         <td class="Ta(start) W(45%)" data-reactid="86">
            <span class="" data-reactid="87">
               <!-- react-text: 88 -->Group Exec. of Coal<!-- /react-text -->
            </span>
         </td>
         <td class="Ta(end)" data-reactid="89">
            <span class="" data-reactid="90">
               <!-- react-text: 91 -->861k<!-- /react-text -->
            </span>
         </td>
         <td class="Ta(end)" data-reactid="92"><span class="" data-reactid="93"><span data-reactid="94">N/A</span></span></td>
         <td class="Ta(end)" data-reactid="95"><span class="" data-reactid="96"><span data-reactid="97">N/A</span></span></td>
      </tr>
      <tr class="C($primaryColor) BdB Bdc($seperatorColor) H(36px)" data-reactid="98">
         <td class="Ta(start)" data-reactid="99">
            <span class="" data-reactid="100">
               <!-- react-text: 101 -->Mr. Michael G. Carter BEng, BBus, MAICD, AFAIM<!-- /react-text -->
            </span>
         </td>
         <td class="Ta(start) W(45%)" data-reactid="102">
            <span class="" data-reactid="103">
               <!-- react-text: 104 -->Group Exec. of Technical Services &amp; Planning<!-- /react-text -->
            </span>
         </td>
         <td class="Ta(end)" data-reactid="105"><span class="" data-reactid="106"><span data-reactid="107">N/A</span></span></td>
         <td class="Ta(end)" data-reactid="108"><span class="" data-reactid="109"><span data-reactid="110">N/A</span></span></td>
         <td class="Ta(end)" data-reactid="111"><span class="" data-reactid="112"><span data-reactid="113">N/A</span></span></td>
      </tr>
      <tr class="C($primaryColor) BdB Bdc($seperatorColor) H(36px)" data-reactid="114">
         <td class="Ta(start)" data-reactid="115">
            <span class="" data-reactid="116">
               <!-- react-text: 117 -->Mr. Christopher L. Vagg<!-- /react-text -->
            </span>
         </td>
         <td class="Ta(start) W(45%)" data-reactid="118">
            <span class="" data-reactid="119">
               <!-- react-text: 120 -->Head of Investor Relations &amp; Group Treasurer<!-- /react-text -->
            </span>
         </td>
         <td class="Ta(end)" data-reactid="121"><span class="" data-reactid="122"><span data-reactid="123">N/A</span></span></td>
         <td class="Ta(end)" data-reactid="124"><span class="" data-reactid="125"><span data-reactid="126">N/A</span></span></td>
         <td class="Ta(end)" data-reactid="127"><span class="" data-reactid="128"><span data-reactid="129">N/A</span></span></td>
      </tr>
      <tr class="C($primaryColor) BdB Bdc($seperatorColor) H(36px)" data-reactid="130">
         <td class="Ta(start)" data-reactid="131">
            <span class="" data-reactid="132">
               <!-- react-text: 133 -->Mark  Hairsine<!-- /react-text -->
            </span>
         </td>
         <td class="Ta(start) W(45%)" data-reactid="134">
            <span class="" data-reactid="135">
               <!-- react-text: 136 -->Mang. of Media &amp; Communications<!-- /react-text -->
            </span>
         </td>
         <td class="Ta(end)" data-reactid="137"><span class="" data-reactid="138"><span data-reactid="139">N/A</span></span></td>
         <td class="Ta(end)" data-reactid="140"><span class="" data-reactid="141"><span data-reactid="142">N/A</span></span></td>
         <td class="Ta(end)" data-reactid="143"><span class="" data-reactid="144"><span data-reactid="145">N/A</span></span></td>
      </tr>
      <tr class="C($primaryColor) BdB Bdc($seperatorColor) H(36px)" data-reactid="146">
         <td class="Ta(start)" data-reactid="147">
            <span class="" data-reactid="148">
               <!-- react-text: 149 -->Ms. Tina  Thomas<!-- /react-text -->
            </span>
         </td>
         <td class="Ta(start) W(45%)" data-reactid="150">
            <span class="" data-reactid="151">
               <!-- react-text: 152 -->Group Exec. of Corp.<!-- /react-text -->
            </span>
         </td>
         <td class="Ta(end)" data-reactid="153"><span class="" data-reactid="154"><span data-reactid="155">N/A</span></span></td>
         <td class="Ta(end)" data-reactid="156"><span class="" data-reactid="157"><span data-reactid="158">N/A</span></span></td>
         <td class="Ta(end)" data-reactid="159"><span class="" data-reactid="160"><span data-reactid="161">N/A</span></span></td>
      </tr>
      <tr class="C($primaryColor) BdB Bdc($seperatorColor) H(36px)" data-reactid="162">
         <td class="Ta(start)" data-reactid="163">
            <span class="" data-reactid="164">
               <!-- react-text: 165 -->James  Coe<!-- /react-text -->
            </span>
         </td>
         <td class="Ta(start) W(45%)" data-reactid="166">
            <span class="" data-reactid="167">
               <!-- react-text: 168 -->Mang. of Market Intelligence &amp; Sustainability<!-- /react-text -->
            </span>
         </td>
         <td class="Ta(end)" data-reactid="169"><span class="" data-reactid="170"><span data-reactid="171">N/A</span></span></td>
         <td class="Ta(end)" data-reactid="172"><span class="" data-reactid="173"><span data-reactid="174">N/A</span></span></td>
         <td class="Ta(end)" data-reactid="175"><span class="" data-reactid="176"><span data-reactid="177">N/A</span></span></td>
      </tr>
      <tr class="C($primaryColor) BdB Bdc($seperatorColor) H(36px)" data-reactid="178">
         <td class="Ta(start)" data-reactid="179">
            <span class="" data-reactid="180">
               <!-- react-text: 181 -->Mr. Dominic Dupont Smith<!-- /react-text -->
            </span>
         </td>
         <td class="Ta(start) W(45%)" data-reactid="182">
            <span class="" data-reactid="183">
               <!-- react-text: 184 -->Company Sec.<!-- /react-text -->
            </span>
         </td>
         <td class="Ta(end)" data-reactid="185"><span class="" data-reactid="186"><span data-reactid="187">N/A</span></span></td>
         <td class="Ta(end)" data-reactid="188"><span class="" data-reactid="189"><span data-reactid="190">N/A</span></span></td>
         <td class="Ta(end)" data-reactid="191">
            <span class="" data-reactid="192">
               <!-- react-text: 193 -->1964<!-- /react-text -->
            </span>
         </td>
      </tr>
      <tr class="C($primaryColor) BdB Bdc($seperatorColor) H(36px)" data-reactid="194">
         <td class="Ta(start)" data-reactid="195">
            <span class="" data-reactid="196">
               <!-- react-text: 197 -->Lowana  Riddiford<!-- /react-text -->
            </span>
         </td>
         <td class="Ta(start) W(45%)" data-reactid="198">
            <span class="" data-reactid="199">
               <!-- react-text: 200 -->Warehouse Logistics Coordinator<!-- /react-text -->
            </span>
         </td>
         <td class="Ta(end)" data-reactid="201"><span class="" data-reactid="202"><span data-reactid="203">N/A</span></span></td>
         <td class="Ta(end)" data-reactid="204"><span class="" data-reactid="205"><span data-reactid="206">N/A</span></span></td>
         <td class="Ta(end)" data-reactid="207"><span class="" data-reactid="208"><span data-reactid="209">N/A</span></span></td>
      </tr>
   </tbody>
</table>

The fact that there is a <span> element within each <td> should not affect IMPORTXML / IMPORTHTML as in other sites it can pick up the text content within <a> tags no problem.

However the comments for react <!-- react-text: 193 -->1964<!-- /react-text --> might be causing the Spreadsheet functions to get confused.

Their robots.txt file doesn't seem to contain any clues either:

User-agent: *
Sitemap: https://au.finance.yahoo.com/sitemaps/finance-sitemap_index_AU_en-AU.xml.gz
Disallow: /r/
Disallow: /__rapidworker-1.2.js
Disallow: /__blank
Disallow: /_td_api
Disallow: /_remote

Google's user agent is Mozilla/5.0 (compatible; GoogleDocs; apps-spreadsheets; http://docs.google.com)

So I can't say for sure what about the site makes it go wrong, but it is likely a change on the Yahoo side of things. Maybe they have rate limited Google's IP addresses


Use UrlFetchApp or better yet, an actual API

I believe you are pushing the limits of what is possible with humble Spreadsheet functions. Which is great, don't get me wrong, however, scraping with a tool that is really not meant for heavy scraping is not very reliable. Considering that web scraping is also not 100% reliable as a source of information as it is constantly subject to the scraped sites changing their HTML structure or leaving things to be dynamically generated by JavaScript. Things will eventually break.

With UrlFetchApp within Apps Script you have a lot more control and flexibility over how to scrape your data.

Though if this is a serious activity for you which requires dependability, it will pay off to use an API. APISs are designed for websites and sheets and databases to fetch data from them. They welcome it. There are some free offerings about too.

iansedano
  • 6,169
  • 2
  • 12
  • 24
  • Thank you very much! You've put extensive effort in helping, I really appreciate it. Since loading times were getting significantly slower, I am moving towards API as you suggested (I guess it's time to learn!). I can see the benefits and it will be more beneficial in the long term. Response was very inciteful. Thank you. – Maikelele Apr 07 '21 at 15:57
1

Your linked Google Sheet is nice work. These examples confirm the problem, and the timing. During the last week, numerous linking errors in previously working sheets have been reported. The linked Google Sheet summarizes some debugging examples to try to determine the problem. The debugging example Ticker debug1 'C003.F' does not link. The linking error is '#N/A' 'Error Resource at URL not found'. In most sheets the majority of the links are now faulty.

IMPORTXML example:

URL: https://de.finance.yahoo.com/quote/
XPath: "//*[@id='quote-header-info']/div[3]/div[1]/div/span[1]"
Ticker C2: 'C003.F'

Formula: =TRANSPOSE(IMPORTXML(CONCATENATE("https://de.finance.yahoo.com/quote/", C2),"//*[@id='quote-header-info']/div[3]/div[1]/div/span[1]")) The formula uses the OR '|' notation to decrease and limit the number of calls in a sheet.

Here is a link to debugging examples: https://docs.google.com/spreadsheets/d/1-xXw94LC3QL3rf-IoAAT9stwz06GTNMSV_v1a8YdWFw/edit?usp=sharing

debug1 Ticker C2: 'C003.F' does not link
debug3 Ticker: 'EXSB.DE' does link
debug6 Ticker: 'ALV.DE' does link

The debugging examples in the linked sheet demonstrate some formulas that work and others that do not. All web pages were checked and functioned.

g_gabriel
  • 11
  • 1
0

Been having the same problem for several weeks with importxml showing #N/A when it was working before - bumping for interest.