I am working on a Shiny app that reads Word documents uploaded by users. The uploaded document then displays a table of all elements in the document and their formatting. I want it to also show any pictures from the uploaded Word doc. Documents containing multiple images aren't an issue - users will only ever upload documents with one image.
To do this, I am using the officer
package. It has a function called media_extract
where you can do exactly what I want. The issue is, while the documentation says this function can be used to extract images from .doc or .ppt files, I can only get it to work for the latter. This is because media_extract
takes the image file path as an argument, but I cannot generate a file path for Word docs. The file path is generated by using one of two officer
functions depending on the file type: docx_summary
or pptx_summary
. These are also the functions I use to generate the tables rendered in my app. The pptx_summary
creates a table with a media_path
column, which displays a file path for image elements, while docx_summary
generates no such column. Absent that column and the path it includes, I don't know how to extract images from Word docs using this function.
For your convenience, here is my code for two Shiny apps: one that reads powerpoints and one for word docs. If you upload a powerpoint file and word file that include an image you will see how the tables generated in each app are different. My powerpoint app also renders an image, to show you how that is done. Obviously that functionality is not in my word app...
Powerpoint reader app:
library(officer)
library(DT)
library(shiny)
ui<- fluidPage(
titlePanel("Document Scanner"),
sidebarLayout(
sidebarPanel(
fileInput("uploadedfile", "Upload a file", multiple=FALSE,
accept=c(".ppt", ".pptx", ".docx"))
),
mainPanel(
tags$h3(tags$b("Document Summary")),
br(),
DT::dataTableOutput("display_table"),
br(),
imageOutput("myImage")
)
)
)
server<-function(input,output) {
#creating reactive value for uploaded file
x<-reactive({
uploadedfileDF<- input$uploadedfile
uploadedfileDataPath<- uploadedfileDF$datapath
read_pptx(uploadedfileDataPath)
})
#rendering formatting table
output$display_table<-DT::renderDataTable({
req(input$uploadedfile)
DT::datatable(pptx_summary(x()))
})
#rendering images from powerpoint
output$myImage<-renderImage({
readFile<-x()
fileSummaryDF<-pptx_summary(readFile)
#Getting path to image (this is basically straight from the documentation
#for media_extract)
fileSummaryDF_filtered<- fileSummaryDF[fileSummaryDF$content_type %in% "image", ]
media_file <- fileSummaryDF_filtered$media_file
png_file <- tempfile(fileext = ".png")
media_extract(readFile, path = media_file, target = png_file)
list(src = png_file,
alt="Test Picture")
})
}
shinyApp(ui, server)
Word reader app:
library(officer)
library(DT)
library(shiny)
ui<- fluidPage(
titlePanel("Word Doc Scanner"),
sidebarLayout(
sidebarPanel(
fileInput("uploadedfile", "Upload a file", multiple=FALSE,
accept=c(".doc", ".docx"))
),
mainPanel(
tags$h3(tags$b("Document Summary")),
br(),
DT::dataTableOutput("display_table"),
imageOutput("image1")
)
)
)
server<-function(input,output) {
# creating reactive content from uploaded file
x<-reactive({
print(input$uploadedfile)
uploadedfileDF<- input$uploadedfile
uploadedfileDataPath<- uploadedfileDF$datapath
docDF<-read_docx(path=uploadedfileDataPath)
summaryDF<-docx_summary(docDF)
})
#rendering formatting table
output$display_table<-DT::renderDataTable({
req(input$uploadedfile)
DT::datatable(x())
})
#how to render image without a image path anywhere in table?
}
shinyApp(ui, server)
If this can't be done in officer
then I'm happy to do it a different way. Thank you.