0

Context

I perform data analysis primarily in R, and the final output is typically in the form of one or more MS PowerPoint decks that can be shared with non-technical clients for downstream editing. Over the years, I've streamlined my process to the following:

  1. Preprocess data (clean, merge, etc.) as usual;
  2. Create a .Rmd file for PowerPoint presentation output (as described here) and add all the code used for generating final outputs there; separate the contents into reasonably-sized pieces such that the text / table / plots in each slide of the outputted file doesn't spill beyond the slide's margins.
  3. Knit the markdown file into a MS Powerpoint presentation.

I often deal with 50+ slides & 100+ plot outputs, so the time saved thanks to this workflow, where I don't have to insert individual images into a deck of slides (or worse, try to subsequently pinpoint a specific one for tweaks based on client's feedback), is non-trivial.

One particular characteristic of R markdown for PowerPoint is that the hierarchical structure described by #, ##, ..., #### is flattened to two levels: the top level corresponds to slides that serve as section dividers, while the bottom level corresponds to slides for content (typically in one or two columns). I typically specify the division between top & bottom level as 2, which means a line in the .Rmd file beginning with # blah blah goes into a section divider slide, while a line beginning with ## blah blah will go into the header placeholder on a content slide. The following toy example of a .Rmd file illustrates this:

---
title: Title slide (Slide 1)
output:
  powerpoint_presentation:
    slide_level: 2 # use this to override default
---

# Section header (Slide 2)

Any content here (below a section header & without a slide header preceding it) 
will go into a new header-less slide (Slide 3).

## Slide demonstrating markdown syntax (Slide 4)

*italic*, **bold**, ~subscript~, ^superscript^, [small caps]{.smallcaps}, `verbatim`

# A second section header (Slide 5)

## A slide for numbered points (Slide 6)

1. Numbered points
1. In running order
1. No matter what values were provided

## Another slide on bullet points (Slide 7)

- Bullet point 1
- Bullet point 2

Issues

  1. I like to organize my slides into collapsible sections in PowerPoint for my sanity + ease of navigating in slide show mode, but find it cumbersome to do so manually in PowerPoint, every time I knit a file.

  2. The actual template used for section placeholder slides has two placeholders, one meant for header text (which is what # blah blah feeds into), and one meant for content text - which currently goes wasted. Since these slides serve as section dividers, I would like the second placeholder to show something like "Section [#]", or "Annex [#]", etc.

Note: I know rmarkdown::powerpoint_presentation() includes an option for numbered sections, but that adds a section number before EVERY slide header, & I don't necessarily want numbers to increment automatically at slide level.

Desired state

I'd like to have a solution that takes in the result from knitting the R markdown file and automatically processes it to:

  1. split slides into sections based on location of section header slides; and

  2. modify the section header slides to include subtitles of the form "Section XXX".

Z.Lin
  • 28,055
  • 6
  • 54
  • 94

1 Answers1

1

Solution

I wrote a helper function based off existing functionalities provided by the officer package (for manipulating PowerPoint files) and the xml2 package (for manipulating the underlying XML files), to:

  1. Identify which slides in the inputted PowerPoint deck uses the Section Header template (exit function if there isn't any);
  2. Split the deck into sections at these points, in running order from 0 onwards (the 0th section contains slides preceding the first section divider);
  3. Optionally (if add.subtitle = TRUE), insert "Section [#]" in running order at the content placeholder associated with each slide that uses the Section Header template.

The function also provides for overwriting the original inputted file (if overwrite = TRUE), or saving the new version under a new name ([original file name]_sectioned.pptx by default). In addition, it's chatty by default, because I get worried when I don't see what's going on.

require(officer)
require(xml2)
require(dplyr)

divide.sections <- function(file.path, add.subtitle = TRUE,
                            overwrite = FALSE, new.suffix = "_sectioned",
                            verbose = TRUE) {
  
  # load data
  pp <- read_pptx(file.path)
  
  if(verbose) message("Loaded ", basename(file.path),
                      " as ", class(pp),
                      " object with ", length(pp),
                      " slides.")
  
  # get slide IDs
  slide.ids <- pp$presentation$get() %>%
    xml_child(2) %>%
    xml_find_all("p:sldId") %>%
    xml_attr("id")

  # find which slides use section header template
  layout.name <- pp$slideLayout$names()
  layout.name <- names(which(layout.name == "Section Header"))
  slide.layout <- sapply(seq_along(slide.ids),
                         function(i) pp$slide$get_slide(i)$layout_name())
  names(slide.layout) <- slide.ids
  slide.layout <- slide.layout[slide.layout == layout.name]

  if(length(slide.layout) > 0) {

    # prepare section groupings
    slide.df <- data.frame(slide.id = slide.ids) %>%
      mutate(new.section = slide.id %in% names(slide.layout)) %>%
      mutate(section.seq = cumsum(new.section))

    if(add.subtitle) {
      slide.df1 <- slide.df %>%
        mutate(slide.seq = seq(1, n())) %>%
        filter(new.section)
      
      for(i in seq_len(nrow(slide.df1))) {
        pp <- on_slide(pp, slide.df1[["slide.seq"]][i])
        pp <- ph_with(pp,
                      # "Section" is hard-coded for now, as I haven't had a need for any other term;
                      # but if needed, this can be a parameter in the function too.
                      value = paste("Section", slide.df1[["section.seq"]][i]),
                      location = ph_location_label("Text Placeholder 2"))
      }
      rm(slide.df1)
    }

    slide.df <- cbind(slide.df,
                      suppressWarnings(pptx_summary(pp)) %>%
                        group_by(slide_id) %>%
                        slice(1) %>%
                        ungroup() %>%
                        arrange(slide_id) %>%
                        select(text)) %>%
      mutate(section.title = case_when(new.section ~ paste(section.seq, text, sep = ": "),
                                       row_number() == 1 ~ "0",
                                       TRUE ~ ""))
    
    section.id <- slide.df %>%
      select(section.seq) %>%
      unique() %>%
      mutate(section.id = paste0("{",
                                 toupper(uuid::UUIDgenerate(n = n())),
                                 "}"))
    
    slide.df <- slide.df %>%
      left_join(section.id, by = "section.seq") %>%
      select(slide.id, section.seq, section.id, section.title) %>%
      split(.$section.seq)

    if(verbose) message("\nFound ", length(slide.layout),
                        " slide(s) using the Section Header layout. Dividing ", length(pp),
                        " slides into ", length(slide.df),
                        " sections containing [", paste(sapply(slide.df, nrow), collapse = ", "),
                        "] slides respectively.")

    # insert into relevant xml
    pp$presentation$get() %>%
      xml_child(6) %>%
      xml_add_child(.value = 'p:ext',
                    uri="{521415D9-36F7-43E2-AB2F-B90AF26B5E84}",
                    `xmlns:p14` = "http://schemas.microsoft.com/office/powerpoint/2010/main",
                    .where = 0)
    
    pp$presentation$get() %>%
      xml_child(6) %>%
      xml_child(1) %>%
      xml_add_child(.value = 'p14:sectionLst')

    p.xml <- pp$presentation$get() %>%
      xml_child(6) %>%
      xml_child(1) %>%
      xml_child(1)

    for(i in seq_along(slide.df)) {
      p.xml %>%
        xml_add_child(.value = "p14:section",
                      name = slide.df[[i]][["section.title"]][1],
                      id = slide.df[[i]][["section.id"]][1])
      
      p.xml %>%
        xml_child(i) %>%
        xml_add_child(.value = "p14:sldIdLst")

      for(j in seq_len(nrow(slide.df[[i]]))) {
        p.xml %>%
          xml_child(i) %>%
          xml_child(1) %>%
          xml_add_child(.value = "p14:sldId",
                        id = slide.df[[i]][["slide.id"]][j])
        
      }
      
    }
    
    rm(i, j, slide.df)

    if(!overwrite) {
      
      file.path <- gsub(".pptx", paste0(new.suffix, ".pptx"),
                        
                        file.path, fixed = TRUE)
      
    }
    
    print(pp, target = file.path)
    
    if(verbose) message("\nSaving modified deck as ", basename(file.path),
                        
                        ifelse(overwrite, " (overwriting original).", "."))
    
  } else {
    
    if(verbose) message("None of the slides use the Section Header layout. ",
                        "Exiting function now without any modification to original deck.")
    
  }

  return(file.path)
}

Demonstration

This is what the MS PowerPoint file looks like, after knitting directly from the toy example .Rmd file used in the question & creatively named 'demo.pptx':

before

divide.sections("demo.pptx") # default parameters: yes to adding subtitle, 
                             # no to overwrite, yes to verbosity

Loaded demo.pptx as rpptx object with 7 slides.

Found 2 slide(s) using the Section Header layout. Dividing 7 slides into 3 sections containing [1, 3, 3] slides respectively.

Saving modified deck as demo_sectioned.pptx.

This is the result:

after

Caveat

This function was developed to complement my existing R Markdown-to-PowerPoint workflow, and may not work with files created natively in PowerPoint (e.g. they might have templates with different names, or similar layout names but different placeholder names, etc.). When I started on this track, I had to clean up my team's customized PowerPoint template until it was practically sterile, before the knitting process stopped throwing up errors -- but the result was worth it for what I needed, and if anyone else is in the same boat, you may find it useful, too.

Z.Lin
  • 28,055
  • 6
  • 54
  • 94