1

I have some scripts working well using ReporteRs and am trying to update them to use officer. My scripts are quite repetitive as I just need to output pretty much the same thing lots of times just changing font sometimes. After conversion I found the scripts are so slow I will not be able to use them. The scripts run in a few minutes in ReporteRs but take an age in officer.

Why is this 5000 times in officer:

body_add_par(doc, "")

So much slower than the equivalent in ReporteRs:

doc <- addParagraph(doc, '')

Many thanks

Code (all vector has 2000+ elements):

outputFile <- paste0(OutputDir, "test.docx")

#SET STYLES
norm <- fp_text(color = "black", font.size = 10, bold = FALSE, italic = FALSE,
                underlined = FALSE, font.family = "Arial", vertical.align = "baseline",
                shading.color = "transparent")

norm_red <- fp_text(color = "red", font.size = 10, bold = FALSE, italic = FALSE,
                underlined = FALSE, font.family = "Arial", vertical.align = "baseline",
                shading.color = "transparent")

norm_blue <- fp_text(color = "blue", font.size = 10, bold = FALSE, italic = FALSE,
                    underlined = FALSE, font.family = "Arial", vertical.align = "baseline",
                    shading.color = "transparent")

norm_green <- fp_text(color = "green", font.size = 10, bold = FALSE, italic = FALSE,
                     underlined = FALSE, font.family = "Arial", vertical.align = "baseline",
                     shading.color = "transparent")

bold <- fp_text(color = "black", font.size = 10, bold = TRUE, italic = FALSE,
                underlined = FALSE, font.family = "Arial", vertical.align = "baseline",
                shading.color = "transparent")

bold_red <- fp_text(color = "red", font.size = 10, bold = TRUE, italic = FALSE,
                    underlined = FALSE, font.family = "Arial", vertical.align = "baseline",
                    shading.color = "transparent")

bold_blue <- fp_text(color = "blue", font.size = 10, bold = TRUE, italic = FALSE,
                     underlined = FALSE, font.family = "Arial", vertical.align = "baseline",
                     shading.color = "transparent")

bold_green <- fp_text(color = "green", font.size = 10, bold = TRUE, italic = FALSE,
                      underlined = FALSE, font.family = "Arial", vertical.align = "baseline",
                      shading.color = "transparent")

doc <- read_docx()

#ADD TITLE
fpar_ <- fpar(ftext("ASSIGNMENTS", prop = bold))
doc <- body_add_fpar(doc, fpar_, style = "centered", pos = "on")
doc <- body_add_par(doc, "", style = NULL, pos = "after")

#ADD DATE, DIRECTORY
fpar_ <- fpar(ftext("DATE: ", prop = bold),
              ftext(date(), prop = norm))
doc <- body_add_fpar(doc, fpar_, style = "Normal", pos = "after")

fpar_ <- fpar(ftext("DIRECTORY: ", prop = bold),
            ftext(Dir, prop = norm))
doc <- body_add_fpar(doc, fpar_, style = "Normal", pos = "after")

doc <- body_add_par(doc, "", style = NULL, pos = "after")

#Get all
all <- as.character(Summary$Name)

for (i in 1:length(all)) {

  res <- as.numeric(Types[Types$Num==all[i], "Code"])

  if (5 %in% res | 12 %in% res) {
    #Green
    fpar_ <- fpar(ftext(all[i], prop = bold_green))
  } else if (7 %in% res) {
    #Red
    fpar_ <- fpar(ftext(all[i], prop = bold_red))
  } else if (8 %in% res) {
    #Blue
    fpar_ <- fpar(ftext(all[i], prop = bold_blue))
  } else {
    fpar_ <- fpar(ftext(all[i], prop = bold))
  }

  doc <- body_add_fpar(doc, fpar_, style = "Normal", pos = "after")

  #Get list of files
  res <- unique(Detail[Detail$Num==all[i], c("Name", "Cat")])

  #OUTPUT FILE NAME AND CAT
  if (nrow(res) == 0) {
      #NO FILE FOUND
  } else {

     for (j in 1:nrow(res)) {

       fpar_ <- fpar(ftext(paste(as.character(res[j, "Name"]), " "), prop = bold),
                     ftext(as.character(res[j, "Cat"]), prop = norm))
       doc <- body_add_fpar(doc, fpar_, style = "Normal", pos = "after")
     }
  }
  doc <- body_add_par(doc, "", style = NULL, pos = "after")
}

print(doc, target = outputFile)
David Gohel
  • 9,180
  • 2
  • 16
  • 34
Jim Smith
  • 11
  • 2

1 Answers1

2

This is a comparison with some reproducible code:

library(officer)
library(ReporteRs)
library(microbenchmark)

docx()# first run can be slow because of java init. operations


mb <- microbenchmark::microbenchmark(
  officer = {
    doc <- read_docx()
    for(i in 1:100){
      doc <- body_add_par(doc, "")
    }
  }, 
  ReporteRs = {
    doc <- docx()
    for(i in 1:100){
      doc <- addParagraph(doc, '')
    }
  } )

The results are below - officer is the winner:

> mb
Unit: milliseconds
      expr      min       lq     mean   median       uq      max neval
   officer 224.3742 232.9602 238.8452 237.5110 241.5320 325.4288   100
 ReporteRs 311.7194 337.9194 349.7107 343.9703 353.8814 447.2623   100

Here is my sessionInfo() result:

> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] fr_FR.UTF-8/fr_FR.UTF-8/fr_FR.UTF-8/C/fr_FR.UTF-8/fr_FR.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] microbenchmark_1.4-4 ReporteRs_0.8.10     ReporteRsjars_0.0.4  officer_0.3.2       

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.17      knitr_1.20        xml2_1.2.0        magrittr_1.5      uuid_0.1-2        xtable_1.8-2     
 [7] R6_2.2.2          tools_3.5.1       rvg_0.1.9.001     R.oo_1.22.0       png_0.1-7         htmltools_0.3.6  
[13] yaml_2.1.19       digest_0.6.15     zip_1.0.0         rJava_0.9-10      shiny_1.1.0       later_0.7.3      
[19] base64enc_0.1-3   R.utils_2.6.0     promises_1.0.1    mime_0.5          compiler_3.5.1    gdtools_0.1.7    
[25] R.methodsS3_1.7.1 httpuv_1.4.4.2
David Gohel
  • 9,180
  • 2
  • 16
  • 34
  • Is this still true if you use a number much larger than 100? – Jim Smith Jul 20 '18 at 10:12
  • 1
    What about your case? Can you come with a code that shows officer is slower? In that case, I would be happy to try to improve the package. – David Gohel Jul 20 '18 at 12:59
  • After your first answer I tried the same code as you have included with no problem. I then removed the microbench code and tried a loop of 5000 for ReporteRs and officer separately. For me the ReporteRs test took less than half a second but the officer test is still going after 10 minutes. Perhaps it's my setup: officer 0.3.1 / R version 3.5.1 (2018-07-02) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1 – Jim Smith Jul 20 '18 at 14:10
  • 1
    OK I will have a look, still, can you show a code of yours? I'd like to see what functions you are using (I can not believe your are looping 5000 times to produce empty paragraphs ;)). – David Gohel Jul 20 '18 at 14:29