3

Is there way to edit author of .xlsx file from RStudio using xlsx package?

I create simple xlsx workbook

data_1=data.frame(1,1,2)
require(xlsx)
wb <- createWorkbook() 
sheet <- createSheet(wb,"TEST") 
addDataFrame(as.data.frame(data_1),
             sheet=sheet,
             startRow=1,
             startColumn=1,
             row.names=FALSE
             )
saveWorkbook(wb,file = "TEST.xlsx") 

in Author of this xlsx in Excel i see "Apache POI", how can i edit this field?

Cant find something like this in documentation for xlsx.

Thanks!

Batanichek
  • 7,761
  • 31
  • 49
  • The `xlsx` package makes use of Apache POI, but does not set the author. Here is a [related question](https://stackoverflow.com/questions/11143303/how-to-set-author-name-to-excel-file-using-poi) on how to do that in Java. You could implement it yourself for `xlsx` and send a pull request to the [xlsx repo on Github](https://github.com/dragua/xlsx). – ROLO Oct 02 '15 at 09:46
  • 2
    An xlsx file is a zip file that contains a tree of xml files. Within that zip file docprops/core.xml contains the author so that file would need to be edited. You could unzip, edit and rezip yourself with R. – G. Grothendieck Oct 02 '15 at 11:41
  • @G.Grothendieck ideally, yes, but see my non-answer-answer (this used to work, but I've only tried reopening the modified xlsx on the new OS X Office). – hrbrmstr Oct 02 '15 at 12:42

2 Answers2

4
library(XML)

# source workbook ---------------------------------------------------------

xl <- "~/Documents/wb.xlsx"

# make a copy using .zip extension ----------------------------------------

tmp <- tempfile(fileext=".zip")
xl <- path.expand(xl)
ok <- file.copy(xl, tmp)

# unzip it and get file list ----------------------------------------------

tmpdir <- tempfile()
fils <- unzip(tmp, exdir=tmpdir)

# get the doc properties file (one of them anyway) ------------------------

props_file <- grep("docProps/core.xml", fils, value=TRUE)

# open it -----------------------------------------------------------------

props <- xmlTreeParse(props_file, useInternalNodes=TRUE)

# view some info ----------------------------------------------------------

for (tag in c("dc:title", "dc:subject", "dc:creator", "cp:keywords",
              "dc:description", "cp:lastModifiedBy", "cp:category")) {
  print(xmlValue(props[[sprintf("//%s", tag)]]))
}

# modify some info --------------------------------------------------------

# you can do this for any of those properties
creator <- getNodeSet(props, "//dc:creator")[[1]]
xmlValue(creator) <- "Mickey Mouse"

# save out the modified file ----------------------------------------------

saveXML(props, props_file)

# re-zip the archive ------------------------------------------------------

unlink(tmp)
cur <- getwd()
setwd(tmpdir)
zip(tmp, basename(tmpdir))
setwd(cur)

# move new xlsx to source xlsir -------------------------------------------

file.copy(tmp,
          paste0(file.path(dirname(path.expand(xl)), basename(tmpdir)), ".xlsx"),
          overwrite=FALSE) # FALSE for testing only

Test that by making a dummy notebook with some document properties.

Here are two functions from that code, one to view the properties and one to set them via a named vector:

#' Set Excel (xlsx) document properties
#'
#' pass in a named vector. Names should be in this set:
#'
#'   "dc:title", "dc:subject", "dc:creator", "cp:keywords",
#'   "dc:description", "cp:category"
#'
#' @param xl path to excel xlsx file
#' @param file_props document properties to set (named vector)
#' @return new filename created (this doesn't overwrite the existing since
#'         there's not enough error checking)
#' @examples
#' set_properties("path/tp/some.xlsx",
#'                c(`dc:title`="Cool Title",
#'                  `dc:subject`="Cool Subject",
#'                  `dc:creator`="Cool Creator"))
set_properties <- function(xl, file_props) {

  require(XML)

  # make a copy using .zip extension ----------------------------------------

  tmp <- tempfile(fileext=".zip")
  xl <- path.expand(xl)
  ok <- file.copy(xl, tmp)

  # unzip it and get file list ----------------------------------------------

  tmpdir <- tempfile()
  fils <- unzip(tmp, exdir=tmpdir)

  # get the doc properties file (one of them anyway) ------------------------

  props_file <- grep("docProps/core.xml", fils, value=TRUE)

  # open it -----------------------------------------------------------------

  props <- xmlTreeParse(props_file, useInternalNodes=TRUE)

  # you can do this for any of those properties
  for (tag in names(file_props)) {
    # message(sprintf("//%s", tag))
    # message(file_props[tag])
    tval <- getNodeSet(props, sprintf("//%s", tag))[[1]]
    xmlValue(tval) <- file_props[tag]
  }

  # save out the modified file ----------------------------------------------

  saveXML(props, props_file)

  # re-zip the archive ------------------------------------------------------

  unlink(tmp)
  cur <- getwd()
  setwd(tmpdir)
  zip(tmp, "./")
  setwd(cur)

  # move new xlsx to source xlsir -------------------------------------------
  new_fil <- paste0(file.path(dirname(path.expand(xl)), basename(tmpdir)), ".xlsx")
  file.copy(tmp, new_fil, overwrite=TRUE)

  new_fil
}

#' Display Excel (xlsx) document properties
#'
#' @param xl path to excel xlsx file
#' @examples
#' print_properties("path/to/some.xlsx")
print_properties <- function(xl, props) {
  require(XML)

  # make a copy using .zip extension ----------------------------------------

  tmp <- tempfile(fileext=".zip")
  xl <- path.expand(xl)
  ok <- file.copy(xl, tmp)

  # unzip it and get file list ----------------------------------------------

  tmpdir <- tempfile()
  fils <- unzip(tmp, exdir=tmpdir)

  # get the doc properties file (one of them anyway) ------------------------

  props_file <- grep("docProps/core.xml", fils, value=TRUE)

  # open it -----------------------------------------------------------------

  props <- xmlTreeParse(props_file, useInternalNodes=TRUE)
  for (tag in c("dc:title", "dc:subject", "dc:creator", "cp:keywords",
                "dc:description", "cp:category")) {
    cat(sprintf("%16s", tag), ": ", xmlValue(props[[sprintf("//%s", tag)]]), sep="", "\n")
  }

  unlink(tmp)
  unlink(tmpdir)

}
hrbrmstr
  • 77,368
  • 11
  • 139
  • 205
  • 1
    The parent directory should NOT be part of the zip archive. Remove the `setwd("..")` command. Also double check whether the R zip command produces acceptable output -- if not, then run the external 7z program from R instead. `7z -tzip a excelfilename.xlsx .` – G. Grothendieck Oct 02 '15 at 13:43
  • thx. yep. that was it (what a dumb error on my part). updating – hrbrmstr Oct 02 '15 at 13:47
  • Thanks! Its work , but i needed use `system( paste(shQuote("C:\\Program Files (x86)\\PeaZip\\peazip.exe"),"-add2zip",a) )` for zip. ( dont know why) – Batanichek Oct 02 '15 at 14:00
1

It's easiest to use the library openxlsx instead of xlsx. By default, openxlsx saves a file without specifying an author, but you can set one:

require(openxlsx)
write.xlsx(my_data_frame, "my_filename.xlsx", creator="Your Name")
Lewistrick
  • 2,649
  • 6
  • 31
  • 42
  • Could you also read the author when you load the workbook with openxlsx? – dpelisek Apr 08 '21 at 11:49
  • @dpelisek Yes, with `workbook.properties.creator`. – Lewistrick Apr 08 '21 at 12:25
  • @Lawistrick, I did not got it. There is no function ```workbook.properties.creator```. The workbook object has no attribute called ```properties``` either. Could you please explain it more in details, write an example or refer me to the documentation? Thank you! – dpelisek Apr 10 '21 at 07:39
  • I just found it out: ```openxlsx::getCreators(wb)``` returns the author(s) – dpelisek Apr 10 '21 at 07:47
  • Correct, the question is related to R, not Python – dpelisek Apr 13 '21 at 19:30