4

I am trying to read all EU weekly oil bulletin data files from this source http://ec.europa.eu/energy/observatory/reports, specifically all xls-files with "raw_data" in the file name.

library(rvest)
library(readxl)
library(tidyverse)

url <- "http://ec.europa.eu/energy/observatory/reports/"
files <- read_html(url) %>% html_nodes("a") %>% .[grepl("raw",.)] %>% html_attr("href")

However, read_excel fails to correctly parse all columns of the excel files and returns only the first (date) column. See below example of file 161.

t <- tempfile(fileext = ".xls")
download.file(paste0(url, files[161]), t, mode="wb")
data <- read_excel(t)
unlink(t)

which just returns

A tibble: 126 x 1
`Prices in force on`
<dttm>              
1 2018-03-19 00:00:00 
2 2018-03-19 00:00:00
....

I know that I can download all xls-files and convert them to .xlsx or .csv files using excelcnv.exe, but that is relatively slow and it would be good to have a pure R solution. Any idea how to read all information from the excel files? Many thanks!

user2554330
  • 37,248
  • 4
  • 43
  • 90
GypsyEyes
  • 41
  • 2

0 Answers0