0

I try to print historic adjusted close prices from Yahoo finance to Google Sheets.

=ImportXML("https://sg.finance.yahoo.com/quote/"&B57&"/history?p="&B57, "//tbody/tr[21]/td[6]")

Cell B57 is for example "SPY".

This works fine for historic prices up to 100 days. (it is adjusted here: tr[100])

When I try to get prices later 100 days it returns "N/A". These prices are visible on yahoo finance.

It there a way to adjust XPATH that it works?

I noticed, that in the html code of yahoo pices about 100 days don't have this "data-reactid=1520" in the tr tag.

Rubén
  • 34,714
  • 9
  • 70
  • 166
Max
  • 33
  • 5

4 Answers4

1

In the current stage, it seems that your expected values are included in the HTML data as a JSON object for Javascript. In this case, when the JSON object is retrieved with Google Apps Script, the value can be retrieved. When this is reflected in a sample Google Apps Script, how about the following sample script?

Sample script:

Please copy and paste the following script to the script editor of Google Spreadsheet and save the script. When you use this script, please put a custom function of =SAMPLE("https://sg.finance.yahoo.com/quote/SPY/history?p=SPY") to a cell. By this, the script is run.

function SAMPLE(url) {
  const html = UrlFetchApp.fetch(url).getContentText().match(/root.App.main = ([\s\S\w]+?);\n/);
  if (!html || html.length == 1) return "No data";
  const tempObj = JSON.parse(html[1].trim());
  const obj = tempObj.context.dispatcher.stores;
  const header = ["date", "amount", "open", "high", "low", "close", "adjclose", "volume"];
  return [header, ...obj.HistoricalPriceStore.prices
    .map(o => header.map(h => {
      if (h == "date") {
        return new Date(o[h] * 1000)
      } else if (h == "amount" && o[h]) {
        return `${o[h]} ${o.type}`;
      }
      return o[h];
    }))];
}

Testing:

When this script is run with =SAMPLE("https://sg.finance.yahoo.com/quote/SPY/history?p=SPY"), the following result is obtained.

enter image description here

Note:

  • The above script is for a custom function. If you want to use this script with the script editor, you can also the following sample script.

    function myFunction() {
      const url = "https://sg.finance.yahoo.com/quote/SPY/history?p=SPY"; // This URL is from your question.
    
      const html = UrlFetchApp.fetch(url).getContentText().match(/root.App.main = ([\s\S\w]+?);\n/);
      if (!html || html.length == 1) return;
      const tempObj = JSON.parse(html[1].trim());
      const obj = tempObj.context.dispatcher.stores;
      const header = ["date", "amount", "open", "high", "low", "close", "adjclose", "volume"];
      const values = [header, ...obj.HistoricalPriceStore.prices
        .map(o => header.map(h => {
          if (h == "date") {
            return new Date(o[h] * 1000)
          } else if (h == "amount" && o[h]) {
            return `${o[h]} ${o.type}`;
          }
          return o[h];
        }))];
    
      const sheet = SpreadsheetApp.getActiveSpreadsheet().getSheetByName("Sheet1"); // Please set your sheet name.
      sheet.getRange(sheet.getLastRow() + 1, 1, values.length, values[0].length).setValues(values);
    }
    

Note:

  • If const obj = tempObj.context.dispatcher.stores is the salted base64 data, please check this answer.

References:

Tanaike
  • 181,128
  • 11
  • 97
  • 165
  • Do you think that this is a good question to be used as duplicate target for questions about importing data from Yahoo Finance into Google Sheets? Related (Meta question) [Canonical question for new questions about importing data from Yahoo Finance into Google Sheets](https://meta.stackoverflow.com/q/422494/1595451) – Rubén Jan 04 '23 at 05:33
  • @Rubén About your comment, I think that it is a difficult question and an important question. In order to retrieve the values from Yahoo Finance, it seems that in the current stage, the API is not prepared. (I understand like this.) So, it seems that the users retrieve the values from the HTML data of the site, and such questions have been posted. But, the specification of the server side is often changed. By this, the method of the accepted answer has not been able to be used. I guess that by this reason, the same questions have been posted. – Tanaike Jan 04 '23 at 09:19
  • @Rubén Here, if the specification of the server side is changed, I'm not sure whether the answer is required to be updated by continuing to check the change of specification on the server side. I think that the method of the accepted answer is also useful for the other site and users although that cannot be used after the specification of the server side was changed. So, I think that when the specification of the server side is changed, when a new question is posted and the current answer is posted, it will be useful for users. – Tanaike Jan 04 '23 at 09:19
  • @Rubén These are just my comment. If I misunderstood your comment and the current situation, I apologize. – Tanaike Jan 04 '23 at 09:20
  • Thank you very much for your reply. You undernstood my comment perfectly. I think that this question is an good example os other several having the same problems 1) OP not followed the [ask] / Ask questions wizard guidelines 2) X-Y Problem: OP asked how to fix an error instead of asking for help on understanding how to analyse a webpage in order to determine what tool might be used for web-scraping data, in this case from Yahoo Finance, 3) Yahoo Finance, as many modern websites constantly changing the DOM ids / classes names , etc. – Rubén Jan 04 '23 at 17:33
  • I think that this kind of questions should be closed as duplicate of a canonical question to be wrote specifically for websites like Yahoo Finance that include the data as JSON. I will be posting a draft on meta hopefully soon and share the link with you. – Rubén Jan 04 '23 at 17:44
  • @Rubén Thank you for replying. I think that when a value is retrieved from raw JSON data embedded in HTML data from this URL, the questions are duplicated questions. But, recently, it seems that the salted base64 data is used instead of normal base64 data and raw JSON data. In this case, it is required to use a specific decode process. I think that this might be required to be separated from the above questions. – Tanaike Jan 05 '23 at 00:17
0

not possible because yahoo site uses JavaScript element - the infinity scroll - which kicks in after 100th value and that's the reason why you can't get past that point. you can test this by disabling JS for a given site and what's left can be scraped:

0

player0
  • 124,011
  • 12
  • 67
  • 124
0

It's possible with a workaround :

YahooFinance

Later than 100 days :

YF2

  • Cell with green background : the code to search
  • Cells with orange backgound : cells containing formulas
  • Cells with yellow background : data returned

Formulas used :

=IMPORTXML(A1;"substring-before(substring-after(//script[@id='fc'],'{""prices"":'),',""isPending')")
=SUBSTITUE(SUBSTITUE(SUBSTITUE(A3;"},{";"|");",";";");".";",")
=REGEXREPLACE(A4;"[a-z:{}\[\]""]+";"")
=TRANSPOSE(SPLIT(A5;"|"))
=(((C8/60)/60)/24)+DATE(1970;1;1)
  • IMPORTXML to import the data.
  • SUBSTITUE AND REGEXREPLACE to prepare the TRANSPOSE step.
  • TRANSPOSE to "build" the lines and SPLIT to "build" the columns.
  • DATE to transform timestamp to date.

Sheet

E.Wiest
  • 5,425
  • 2
  • 7
  • 12
  • Thanks a lot for your solution! I will have a look at it. Currently, I have been working on a script to store the latest value every day into a big list function storeValue() { var ss = SpreadsheetApp.getActiveSpreadsheet(); var sheet = ss.getSheetByName('Sheet1'); // where importXML is var value = sheet.getRange("B1").getValue(); // where the cell of interest is var sheet2 = ss.getSheetByName('Sheet2'); // where to store the data var height = sheet2.getLastRow(); sheet2.insertRowAfter(height); sheet2.getRange(height+1, 1, 1, 2).setValues([[new Date(), value]]); } – Max May 06 '20 at 16:59
  • As of January 4, it looks that this solution it's not working anymore. – Rubén Jan 04 '23 at 05:36
-1

Answer:

IMPORTXML can not retrieve data which is populated by a script, and so using this formula to retrieve data from this table is not possible to do.

More Information:

As the first 100 values are loaded into the page without the use of JavaScript (as you can see by disabling JavaScript for https://sg.finance.yahoo.com/quote/SPY/history?p=SPY and reloading the page), the information can be retrieved by IMPORTXML.

As the data after the first 100 results is generated on-the-fly after scrolling down the page, the newly available data is not retrievable by IMPORTXML - as far as the formula sees, there is no 101st <tr> element and so it displays N/A: Imported content is empty .

References:


Related Questions:

Nimantha
  • 6,405
  • 6
  • 28
  • 69
Rafa Guillermo
  • 14,474
  • 3
  • 18
  • 54
  • Thanks for the explanation! Bad news, but now I understand the issue. Do you see a way to create my own database within google sheets, so that it updates the new close price everyday to a list with historic closes. Then I would be able to go back more than 100 days in my own database :-) Thanks in advance – Max May 06 '20 at 15:03