I am working in rmarkdown to produce a report that extracts and displays images extracted from word and powerpoint.
To do this, I am using the officer package. It has a function called media_extract which can 'extract files from an rdocx or rpptx object'.
I have two issues:
- How to view or use the image after I have located it.
- In word, how to locate the image without the
media_path
column.
I have been able to locate an image in pptx using this function: the pptx_summary function creates a data frame with a media_path column, which displays a file path for image elements. The media_path is then used as an argument in the media_extract function to locate the image. See example code from package documentation below:
example_pptx <- system.file(package = "officer",
"doc_examples/example.pptx")
doc <- read_pptx(example_pptx)
content <- pptx_summary(doc)
image_row <- content[content$content_type %in% "image", ]
media_file <- image_row$media_file
png_file <- tempfile(fileext = ".png")
media_extract(doc, path = media_file, target = png_file)
However, when I run media_extract it returns 'TRUE', which is the example output, but I am unsure how to add the image to my report. I've tried assigning the media_extract as a value eg
image <- media_extract(doc, path = media_file, target = png_file)
but this returns 'FALSE'.
How do I include the image as an image in my report?
The second issue I'm having is how to locate an image in word. The documentation for media_extract
says it can be used to extract images from both .docx and .pptx, I have only managed to get it to work for the latter. I haven't been able to create a file path for .docx.
The file path is generated using either; docx_summary
or pptx_summary
, depending on the file type, which create a data frame summary of the files. The pptx_summary
includes a column media_path
, which displays a file path for the image. The docx_summary
data frame doesn't include this column. Another stackoverflow post posed a solution for this using word/media/
subdir which seemed to work, however I'm not sure what this means or how to use it?
How do I extract an image from a word doc, using word/media/
subdir as the media path?