There are tables in ppt or pptx, and I want to extract them as data.frames in R. Any solutions? Thanks.
Alternatives: Turn ppt(x) to pdf in R, and extract them using other packages. Any packages to turn ppt to pdf?
There are tables in ppt or pptx, and I want to extract them as data.frames in R. Any solutions? Thanks.
Alternatives: Turn ppt(x) to pdf in R, and extract them using other packages. Any packages to turn ppt to pdf?
Hope this will work for you. However, code is in python. You can easily modify for R.
prs = Presentation((path_to_presentation))
# text_runs will be populated with a list of strings,
# one for each text run in presentation
text_runs = []
for slide in prs.slides:
for shape in slide.shapes:
if not shape.has_table:
continue
tbl = shape.table
row_count = len(tbl.rows)
col_count = len(tbl.columns)
for r in range(0, row_count):
for c in range(0, col_count):
cell = tbl.cell(r,c)
paragraphs = cell.text_frame.paragraphs
for paragraph in paragraphs:
for run in paragraph.runs:
text_runs.append(run.text)
print(text_runs)```
Please try the package eoffice which published on CRAN and use the inpptx function:
totable(t.test(wt ~ am, mtcars), filename = file.path(tempdir(), "mtcars.pptx"))
## inpptx and indocx provide function read the tables in pptx or docx
tabs <- inpptx(filename = file.path(tempdir(), "mtcars.pptx"), header = TRUE)
To convert a PowerPoint to a PDF in R, you can consider the following approach :
library(RDCOMClient)
pptapp <- COMCreate("PowerPoint.Application")
pptapp[["Visible"]] <- TRUE
pptpres <- pptapp$Presentations()$Open("D:\\ppt_With_Table.pptx")
pptpres$SaveAs("D:\\ppt_With_Table.pdf", FileFormat = 32)
To extract a table from a PowerPoint, you can consider the following approaach :
library(RDCOMClient)
pptapp <- COMCreate("PowerPoint.Application")
pptapp[["Visible"]] <- TRUE
pptpres <- pptapp$Presentations()$Open("D:\\Dropbox\\Reponses_Stackoverflow\\stackoverflow_401\\ppt_With_Table.pptx")
mat_Table1 <- matrix(NA, nrow = 3, ncol = 3)
for(i in 1 : 3)
{
for(j in 1 : 3)
{
mat_Table1[i,j] <- pptapp[["ActivePresentation"]]$Slides(1)$Shapes(1)$Table()$Cell(1,1)$Shape()$TextFrame()$TextRange()$Text()
}
}