Im need obtain the names of set a many pdf files (36000 files). But only the names not load all object. Finally make a data frame like this:
The link of 21 example files: https://drive.google.com/drive/folders/1zUKyVJFICq4Q69zs48wqFNq1UPDvCgbf?usp=sharing
Im use this code:
#set directory
library(pdftools)
library(tm)
files=list.files(pattern = "pdf$")
files
all=lapply(files, pdf_text)
lapply(all, length)
x=Corpus(URISource(files), readerControl = list(reader = readPDF))
x
class(x) #character
DAT_FINAL <- data.frame(text = sapply(x, as.character), stringsAsFactors = T)
DAT_FINAL
The idea is has a data frame because I need compare the numeric names with an excel file for find the missing numbers between documents.
Update: