Description: trying to retrieve historical data from Investing.com using httr
library
Original page: https://www.investing.com/rates-bonds/austria-1-year-bond-yield-historical-data
Expected output: html table with historical data: sample table output
Script logic:
- Send a
POST
query withhttr
- Prettify output of
read_html
method withhtml_table
method
Issue:
- Script retrieves tables from the main page instead of the actual history table
Code:
library(httr)
url <- 'https://www.investing.com/instruments/HistoricalDataAjax'
# mimic XHR POST request implemented in the investing.com website
http_resp <- POST(url = url,
body = list(
curr_id = "23859",
smlID = "202274",
header = "Austria+1-Year+Bond+Yield+Historical+Data",
st_date = "08/01/2021", # MM/DD/YYYY format
end_date = "08/20/2021",
interval_sec = "Daily",
sort_col = "date",
sort_ord = "DESC",
action = "historical_data"
)
)
# parse the returned XML
html_doc <- read_html(http_resp)
print(html_table(html_doc)[[1]])
You might notice that the URL used in the R script uses a different URL https://www.investing.com/instruments/HistoricalDataAjax
compared to the original web-page https://www.investing.com/rates-bonds/austria-1-year-bond-yield-historical-data
. The reason for this is apparently the link used in the POST request when setting the start and end dates. You may see this on the screenshot below:
XHR request header when setting the start and end dates
From what I see, when a user specifies a date for a particular security, website sends a query to HistoricalDataAjax
with parameters and identifiers of securities/assets specified in the body of the request: Example of the requests's body after selecting dates