0

I found an old thread (How do you read a password protected excel file into r?) that recommended that I use the following code to read in a password protected file:

install.packages("excel.link")

library("excel.link")

dat <- xl.read.file("TestWorkbook.xlsx", password = "pass", write.res.password="pass")

dat

However, when I try to do this my R immediately crashes. I've tried removing the write.res.password argument, and that doesn't seem to be the issue. I have a hunch that excel.link might not work with the newest version of R, so if you know of any other ways to do this I'd appreciate the advice.

EDIT: Using read.xlsx generates this error:

    Error in .jcall("RJavaTools", "Ljava/lang/Object;", "newInstance", .jfindClass(class),  : 
     
    org.apache.poi.poifs.filesystem.OfficeXmlFileException: 
The supplied data appears to be in the Office 2007+ XML. 
You are calling the part of POI that deals with OLE2 Office Documents. 
You need to call a different part of POI to process this data (eg XSSF instead of HSSF)
  • Have you tried [read.xlsx](https://www.rdocumentation.org/packages/xlsx/versions/0.6.5/topics/read.xlsx) from the xlsx package? It does, unfortunately, require Java + the rJava package. – neilfws Mar 08 '21 at 03:53
  • @neilfws I've tried that package, but am met with an error. I'll add it to the post. Can this be fixed by just installing the rJava package? – Jamie Warren Mar 08 '21 at 23:05
  • You will need to install `rJava` and perhaps configure it too depending on your OS. It is not always easy, which is why `readxl` is preferred. Unfortunately `readxl` does not handle password-protected files (so far as I know). – neilfws Mar 08 '21 at 23:08

2 Answers2

0

You can remove the password of the excel file without knowing it with the following function (adapted version of code available at https://www.r-bloggers.com/2018/05/remove-password-protection-from-excel-sheets-using-r/)

remove_Password_Protection_From_Excel_File <- function(dir, file, bool_XLSXM = FALSE)
{
  initial_Dir <- getwd()
  setwd(dir)

  # file name and path after removing protection
  if(bool_XLSXM == TRUE)
  {
    file_unlocked <- stringr::str_replace(basename(file), ".xlsm$", "_unlocked.xlsm")

  }else
  {
    file_unlocked <- stringr::str_replace(basename(file), ".xlsx$", "_unlocked.xlsx")
  }

  file_unlocked_path <- file.path(dir, file_unlocked)

  # create temporary directory in project folder
  # so we see what is going on
  temp_dir <- "_tmp"

  # remove and recreate _tmp folder in case it already exists
  unlink(temp_dir, recursive = TRUE)
  dir.create(temp_dir)

  # unzip Excel file into temp folder
  unzip(file, exdir = temp_dir)

  # get full path to XML files for all worksheets
  worksheet_paths <- list.files(paste0(temp_dir, "/xl/worksheets"), full.name = TRUE, pattern = ".xml")

  # remove the XML node which contains the sheet protection
  # We might of course use e.g. xml2 to parse the XML file, but this simple approach will suffice here
  for(ws in worksheet_paths)
  {
    file_Content <- readLines(ws, encoding = "windows1")

    # the "sheetProtection" node contains the hashed password "<sheetProtection SOME INFO />"
    # we simply remove the whole node
    out <- str_replace(file_Content, "<sheetProtection.*?/>", "")
    writeLines(out, ws)
  }

  worksheet_Protection_Paths <- paste0(temp_dir, "/xl/workbook.xml")
  file_Content <- readLines(worksheet_Protection_Paths , encoding = "windows1")
  out <- stringr::str_replace(file_Content, "<workbookProtection.*?/>", "")
  writeLines(out, worksheet_Protection_Paths)

  # create a new zip, i.e. Excel file, containing the modified XML files
  old_wd <- setwd(temp_dir)
  files <- list.files(recursive = T, full.names = F, all.files = T, no.. = T)

  # as the Excel file is a zip file, we can directly replace the .zip extension by .xlsx
  zip::zip(file_unlocked_path, files = files) # utils::zip does not work for some reason
  setwd(old_wd)

  # clean up and remove temporary directory
  unlink(temp_dir, recursive = T)
  setwd(initial_Dir)
}

Once the password is removed, you can read the Excel file. This approach works for me.

Emmanuel Hamel
  • 1,769
  • 7
  • 19
0

You can also consider the following approach to open the Excel file :

library(RDCOMClient)
xlApp <- COMCreate("Excel.Application")
xlApp[["DisplayAlerts"]] <- FALSE
xlApp[["Visible"]] <- TRUE

path_To_Excel_File <- "D:/excel_File.xlsx"
xlWbk <- xlApp$Workbooks()$Open(path_To_Excel_File, password = "xxx")
Emmanuel Hamel
  • 1,769
  • 7
  • 19