I am trying to read all EU weekly oil bulletin data files from this source http://ec.europa.eu/energy/observatory/reports, specifically all xls-files with "raw_data" in the file name.
library(rvest)
library(readxl)
library(tidyverse)
url <- "http://ec.europa.eu/energy/observatory/reports/"
files <- read_html(url) %>% html_nodes("a") %>% .[grepl("raw",.)] %>% html_attr("href")
However, read_excel fails to correctly parse all columns of the excel files and returns only the first (date) column. See below example of file 161.
t <- tempfile(fileext = ".xls")
download.file(paste0(url, files[161]), t, mode="wb")
data <- read_excel(t)
unlink(t)
which just returns
A tibble: 126 x 1
`Prices in force on`
<dttm>
1 2018-03-19 00:00:00
2 2018-03-19 00:00:00
....
I know that I can download all xls-files and convert them to .xlsx or .csv files using excelcnv.exe, but that is relatively slow and it would be good to have a pure R solution. Any idea how to read all information from the excel files? Many thanks!