I am having trouble reading data from *.xls
files into R. I'm trying to use readxl::read_xls()
to read the data in the Microsoft Excel file from following URL: https://www.misoenergy.org/Library/Repository/Market%20Reports/20171114_5min_exante_lmp.xls. I am on R version 3.4.1 (Single Candle) and the output from sessionInfo()
is pasted at the very bottom of this post.
The file has 6 sheets that contain data. As a minimal example, consider reading the second sheet, named RT Ex-Ante 5 Minute LMPs(1)
. The code below was my first attempt at reading this sheet:
library(readxl)
fpath <- '/Users/bmosovsky/Downloads/20171114_5min_exante_lmp.xls'
data <- read_excel( path=fpath, sheet=2, col_names=FALSE )
This allows read_excel to guess the range of data to read and the column types. I get the the warning message,
Warning message:
In read_fun(path = path, sheet = sheet, limits = limits, shim = shim, :
Expecting logical in B65535 / R65535C2: got 'IPL.CC.IPLEV01'
and str(data)
returns
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 65535 obs. of 6 variables:
$ X__1: POSIXct, format: "2017-11-13 04:35:00" "2017-11-13 04:35:00" "2017-11-13 04:35:00" "2017-11-13 04:35:00" ...
$ X__2: logi NA NA NA NA NA NA ...
$ X__3: logi NA NA NA NA NA NA ...
$ X__4: logi NA NA NA NA NA NA ...
$ X__5: logi NA NA NA NA NA NA ...
$ X__6: logi NA NA NA NA NA NA ...
Thinking that perhaps read_excel()
was just incorrectly guessing the column types, I then tried:
data1 <- read_excel( path=fpath, sheet=2, col_names=FALSE,
col_types=c('text', 'text', 'numeric', 'numeric', 'numeric', 'numeric') )
This got rid of the warning, as the columns were correctly typed, but I'm still getting NA
values for all columns except the first. This time str(data1)
returned
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 65535 obs. of 6 variables:
$ X__1: chr "43052.2" "43052.2" "43052.2" "43052.2" ...
$ X__2: chr NA NA NA NA ...
$ X__3: num NA NA NA NA NA NA NA NA NA NA ...
$ X__4: num NA NA NA NA NA NA NA NA NA NA ...
$ X__5: num NA NA NA NA NA NA NA NA NA NA ...
$ X__6: num NA NA NA NA NA NA NA NA NA NA ...
Finally, I tried pasting the first 10 rows of data (formats and all) from the second sheet of the Excel file into a fresh Excel workbook, saving as test.xls
, and trying the following:
fpath_test <- '/Users/bmosovsky/Downloads/test.xls'
data_test <- read_excel( path=fpath_test, sheet=1, col_names=FALSE,
col_types=c('text', 'text', 'numeric', 'numeric', 'numeric', 'numeric') )
Now str(data_test)
returns the correct result:
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 10 obs. of 6 variables:
$ X__1: chr "43052.2" "43052.2" "43052.2" "43052.2" ...
$ X__2: chr "CIN.MARKLND.3" "CIN.MIAMWAB.1" "CIN.MIAMWAB.2" "CIN.MIAMWAB.3" ...
$ X__3: num 22.4 22.6 22.6 22.6 22.5 ...
$ X__4: num 21.6 21.6 21.6 21.6 21.6 21.6 21.6 21.6 21.6 21.6
$ X__5: num 0.8 1.02 1.02 1.02 0.92 0.93 1.29 1.29 1.29 0.06
$ X__6: num 0.04 0.01 0.01 0.01 0.01 0.01 0.05 0.05 0.05 0.06
So, my question is, what is unique to the downloaded Excel file that won't allow the data to be read into R properly? I am trying to read in this data as part of an automated data harvesting process, so any kind of manual manipulation of the Excel file is not possible as a workaround. Can anyone provide some insight as to how I can get data from all the sheets of the .xls
file into R for processing?
Here is the output from sessionInfo()
:
R version 3.4.1 (2017-06-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.6
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] tools stats graphics grDevices utils datasets methods base
other attached packages:
[1] bindrcpp_0.2 rvest_0.3.2 xml2_1.1.1 RPostgreSQL_0.6-2 DBI_0.7-12 lubridate_1.6.0 dplyr_0.7.2 readxl_1.0.0
loaded via a namespace (and not attached):
[1] Rcpp_0.12.12.3 tidyr_0.6.3 assertthat_0.2.0 cellranger_1.1.0 R6_2.2.2 magrittr_1.5 httr_1.2.1 rlang_0.1.1 stringi_1.1.5
[10] curl_2.8.1 stringr_1.2.0 glue_1.1.1 compiler_3.4.1 pkgconfig_2.0.1 bindr_0.1 tibble_1.3.3