0

Description: trying to retrieve historical data from Investing.com using httr library

Original page: https://www.investing.com/rates-bonds/austria-1-year-bond-yield-historical-data

Expected output: html table with historical data: sample table output

Script logic:

  • Send a POST query with httr
  • Prettify output of read_html method with html_table method

Issue:

  • Script retrieves tables from the main page instead of the actual history table

Code:

library(httr)

url <- 'https://www.investing.com/instruments/HistoricalDataAjax'

# mimic XHR POST request implemented in the investing.com website
http_resp <- POST(url = url,
                 body = list(
                   curr_id = "23859", 
                   smlID = "202274", 
                   header = "Austria+1-Year+Bond+Yield+Historical+Data",
                   st_date = "08/01/2021", # MM/DD/YYYY format
                   end_date = "08/20/2021",
                   interval_sec = "Daily",
                   sort_col = "date",
                   sort_ord = "DESC",
                   action = "historical_data"
                 )
                )

# parse the returned XML
html_doc <- read_html(http_resp)
print(html_table(html_doc)[[1]])

You might notice that the URL used in the R script uses a different URL https://www.investing.com/instruments/HistoricalDataAjax compared to the original web-page https://www.investing.com/rates-bonds/austria-1-year-bond-yield-historical-data. The reason for this is apparently the link used in the POST request when setting the start and end dates. You may see this on the screenshot below:

XHR request header when setting the start and end dates

From what I see, when a user specifies a date for a particular security, website sends a query to HistoricalDataAjax with parameters and identifiers of securities/assets specified in the body of the request: Example of the requests's body after selecting dates

  • The link `https://www.investing.com/instruments/HistoricalDataAjax` is returning to homepage. Can you provide a proper link? – Nad Pat Sep 18 '21 at 18:27
  • Hi @NadPat ! Glad to see the post got your attention. Here is the original link: https://www.investing.com/rates-bonds/austria-1-year-bond-yield-historical-data Consider though, that I used HistoricaDataAjax because this is how the original Header request seems to POST the query. I will add more clarifications to the original post to make it clear. – Artem Kochnev Sep 20 '21 at 13:48
  • To download the data you need to sign in. Does `RSelenium` solution works for you? – Nad Pat Sep 20 '21 at 14:35
  • I would not like to use Selenium. First, because it introduces an additional dependency on the user machine. Second, I would expect the execution to be slower. Third, I know that one can bypass the sign-in process using POST request: when data updates on the web through XHR, one gets the HTML table. The real question is to fetch it properly. I know for a fact that `investpy` has it implemented exactly like that. `investpy` is a Python library though, while I would need to implement in R. – Artem Kochnev Sep 20 '21 at 14:37

1 Answers1

0

You can get the table in,

https://www.investing.com/rates-bonds/austria-1-year-bond-yield-historical-data

using rvest

library(rvest)
df = url %>%
  read_html() %>% 
  html_table()

df[[1]]
# A tibble: 25 x 6
   Date          Price   Open   High    Low `Change %`
   <chr>         <dbl>  <dbl>  <dbl>  <dbl> <chr>     
 1 Dec 09, 2021 -0.669 -0.672 -0.633 -0.695 11.69%    
 2 Dec 08, 2021 -0.599 -0.6   -0.549 -0.647 -2.28%    
 3 Dec 07, 2021 -0.613 -0.621 -0.536 -0.656 -7.54%    
 4 Dec 06, 2021 -0.663 -0.648 -0.565 -0.687 -0.30%    
 5 Dec 03, 2021 -0.665 -0.681 -0.577 -0.684 0.45%     
 6 Dec 02, 2021 -0.662 -0.59  -0.573 -0.669 0.46%     
 7 Dec 01, 2021 -0.659 -0.608 -0.577 -0.685 1.70%     
 8 Nov 30, 2021 -0.648 -0.697 -0.601 -0.736 -4.85%    
 9 Nov 29, 2021 -0.681 -0.715 -0.647 -0.745 -12.47%   
10 Nov 27, 2021 -0.778 -0.701 -0.701 -0.778 7.61%  
Nad Pat
  • 3,129
  • 3
  • 10
  • 20
  • Hi @Nad Pat, thanks for stopping by! This approach reads only the last 30 day data. I need a script, which allows to retrieve the data using the parameters: from t to T. That's the reason, I used this Ajax link, which technically substitutes the query with subsettable parameters I can pass in. – Artem Kochnev Dec 16 '21 at 16:05
  • How about using `RSelenium`? – Nad Pat Dec 17 '21 at 09:48
  • I remember you have suggested `RSelenium` in the comment to the original question post. I explained there why it would not be a desirable solution for me. In short, if there is an easier way of extracting data without the dependencies - as it is possible in `investpy`, I would like to replicate it in r. – Artem Kochnev Jan 10 '22 at 14:14