Use R to change multiple PDFs to texts and put in a dataframe

Asked Nov 05 '22 at 14:07

Active Nov 05 '22 at 14:09

Viewed 44 times

I have a few hundreds of PDFs, which I need to change to texts. I do not need to save the text files, but, instead, I extract certain sentences from the text files. I have succeeded to do so in a single pdf file using pdftools.

Now, I need to be able to do it in all my pdfs. I tried the following, but didn't work properly.

files <- list.files(path = "my path", pattern = ".pdf", full.names = TRUE) 

pdf2text <- function(x){
x <- pdftools::pdf_text() %>% 
sapply(files, x) %>%
return()
}

Could anyone help me please? Thank you.

*It would be ideal if the texts are separated by their file names as a dataframe.

edited Nov 05 '22 at 14:09

asked Nov 05 '22 at 14:07

user19192947

`sapply()` doesn’t go inside the function you want to iterate over - instead, try `x <- sapply(files, pdftools::pdf_text)`. – zephryl Nov 05 '22 at 14:18
Even to assign to x and the return() statement is unnecesary. – Ric Nov 05 '22 at 16:06

Use R to change multiple PDFs to texts and put in a dataframe

0 Answers0