0

I'm trying to download a .zip file from a url and extract a .xls to read it.

while download the zip file and extract it was a simples task, i wasn't able to read it, everytime i try to read the .xls file i get the following error message:

Erro: filepath: .\RtmpwNCJdK/Tab_Compl_CNT_1T21.xls libxls error: Unable to open file


# if(!require(readxl)){ install.packages("readxl") }

url <- "https://ftp.ibge.gov.br/Contas_Nacionais/Contas_Nacionais_Trimestrais/Tabelas_Completas/Tab_Compl_CNT.zip"

temp_ <- tempdir()

download.file( url = url,
               destfile = paste0(temp_,"/Tab_Compl_CNT.zip"),
               mode = "wb" )

unzip( zipfile=paste0(temp_,"/Tab_Compl_CNT.zip"), exdir = temp_ )

file <- readxl::read_excel( path = paste0(temp_,"/Tab_Compl_CNT_1T21.xls"), 
                            sheet = "Base Móvel", 
                            skip = 3 )  


To assure that the file was really a xls file i run:

readxl:::format_from_signature( paste0(temp_,"/Tab_Compl_CNT_1T21.xls") )

[1] "xls"

Thanks for any help!

  • My guess is that the file is corrupt somehow. It is not too uncommon for Excel to be able to open files that "compatible" libraries cannot; perhaps open it in Excel then save as a new file? I know this breaks the automated mode of your code. – r2evans Jun 21 '21 at 00:14

1 Answers1

1

I think I can confirm @r2evan's diagnosis. I unzipped the file, opened it and re-saved to a different file using OpenOffice, and was then able to use readxl::read_excel() to read the file.

Note that the file also has "interesting" structure at the beginning - hierarchical/merged labels - that you'll have to deal with as well, the simplest solution would be to skip the first 4 rows and add your own column names manually, although all of these problems [except the one you initially posed of unreadability] can be solved programmatically if you're willing to work hard enough.

This question lists related problems and potential solutions, although the easy ones don't seem to work: there is one programmatic solution there (using OpenOffice command-line tools), everything else is "open it and re-save it with Excel/OpenOffice".

  • gdata::read.xls() opens the original file successfully but then hits "invalid multibyte string"
  • XLConnect::readWorksheetFromFile("Tab_Compl_CNT_1T21.xls"):

Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘readWorksheet’ for signature ‘"workbook", "missing"’

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453