I have a folder of PDFs for example foo1.pdf, foo2.pdf, foo3.pdf.
I would like to read these pdfs in Rstudio and create a dataframe with 2 columns for the document name and the corresponding text. For example:
Document <- c("foo1","foo2","foo3")
Text <- c("text in foo1", "text in foo2","text in foo3")
DF <- data.frame(Document, Text)
What I have tried so far without success:
setwd("path to files")
library(pdftools)
files <- list.files(pattern="pdf$", full.names=TRUE)
filestext <- lapply(files, pdf_text)
filestextDF <- as.data.frame(matrix(filestext,ncol =2,byrow = F))
names(filestextDF) <- c("Document", "Text")
How would it be possible to achieve this ?