0

There are tables in ppt or pptx, and I want to extract them as data.frames in R. Any solutions? Thanks.

Alternatives: Turn ppt(x) to pdf in R, and extract them using other packages. Any packages to turn ppt to pdf?

Hope
  • 109
  • 5

3 Answers3

0

Hope this will work for you. However, code is in python. You can easily modify for R.

prs = Presentation((path_to_presentation))
# text_runs will be populated with a list of strings,
# one for each text run in presentation
text_runs = []
for slide in prs.slides:
    for shape in slide.shapes:
        if not shape.has_table:
            continue    
        tbl = shape.table
        row_count = len(tbl.rows)
        col_count = len(tbl.columns)
        for r in range(0, row_count):
            for c in range(0, col_count):
                cell = tbl.cell(r,c)
                paragraphs = cell.text_frame.paragraphs 
                for paragraph in paragraphs:
                    for run in paragraph.runs:
                        text_runs.append(run.text)

print(text_runs)```
ALee
  • 53
  • 1
  • 8
0

Please try the package eoffice which published on CRAN and use the inpptx function:

totable(t.test(wt ~ am, mtcars), filename = file.path(tempdir(), "mtcars.pptx"))
## inpptx and indocx provide function read the tables in pptx or docx
tabs <- inpptx(filename = file.path(tempdir(), "mtcars.pptx"), header = TRUE)
bioguo
  • 61
  • 4
0

To convert a PowerPoint to a PDF in R, you can consider the following approach :

library(RDCOMClient)
pptapp <- COMCreate("PowerPoint.Application") 
pptapp[["Visible"]] <- TRUE
pptpres <- pptapp$Presentations()$Open("D:\\ppt_With_Table.pptx")
pptpres$SaveAs("D:\\ppt_With_Table.pdf", FileFormat = 32)

To extract a table from a PowerPoint, you can consider the following approaach :

library(RDCOMClient)
pptapp <- COMCreate("PowerPoint.Application") 
pptapp[["Visible"]] <- TRUE
pptpres <- pptapp$Presentations()$Open("D:\\Dropbox\\Reponses_Stackoverflow\\stackoverflow_401\\ppt_With_Table.pptx")

mat_Table1 <- matrix(NA, nrow = 3, ncol = 3)

for(i in 1 : 3)
{
  for(j in 1 : 3)
  {
    mat_Table1[i,j] <- pptapp[["ActivePresentation"]]$Slides(1)$Shapes(1)$Table()$Cell(1,1)$Shape()$TextFrame()$TextRange()$Text()    
  }
}
Emmanuel Hamel
  • 1,769
  • 7
  • 19