I have a list of pdf pathways in one table, and I am trying to repeat the commands below for the rest of the pdf's listed. Basically I am converting the pdf file to text for the file's first page only and then using the keyword_search command to run a search on certain phrases within that page. I can complete this successfully for one file at a time, but I have 281 files. What am I missing??
ONE PDF FILE
my.file<-"//.../cover-letter.pdf"
my.page<-pdf_text(my.file)[1] %>% as.character()
my.result<-keyword_search(my.page, keyword = c('reason','not being marketed', 'available for sale', 'withdrawn from sale', 'commercial distribution', 'target date'), ignore_case = TRUE)
my.result$Cover_Letter<-my.file
my.result<-select(my.result, -5)
result<-merge(TotNoMark_clean, my.result, by = "Cover_Letter", all.x = TRUE)
MULTIPLE PDF FILES: FAILED ATTEMPT
DF<-as.data.frame(TotNoMark_clean)
file.names<-DF$Cover_Letter
for(i in 1:length(file.names)){
{pdf_pages<-pdf_text(file.names[i])[1]
pdf_result<-keyword_search(pdf_pages, keyword = c('reason','not being marketed', 'available for sale', 'withdrawn from sale', 'commercial distribution', 'target date'))
pdf_result$Cover_Letter<-file.names[i]
if (!nrow(pdf_result)) {next}
}
Result<<-pdf_result
}
Result<-select(Result, -5)
Result<-merge(DF, Result, by = "Cover_Letter", all.x = TRUE)
This is the error message I get:
"Error in `$<-.data.frame`(`*tmp*`, "Cover_Letter", value = "//cover-letters/***.pdf") :
replacement has 1 row, data has 0"