I am running into the following error while extracting text from PDF documents (macOS). I have multiple pdf files that I am reading from a folder, parsing it and then writing it to a csv file. This worked fine before but I cant figure out what changed to have caused this error.
setwd("~/Downloads/mainfolder/myfolder/")
my_pdfs <- list.files( pattern = "pdf$") #read in .pdfs
pdf_to_txt <- lapply(my_pdfs, pdf_text) #extract text
Error:
PDF error: May not be a PDF file (continuing anyway)
PDF error (1): Illegal character '{'
PDF error (21): Illegal character '{'
PDF error (272): Illegal character '}'
PDF error (278): Illegal character '}'
PDF error: Couldn't find trailer dictionary
PDF error: Couldn't find trailer dictionary
PDF error: Couldn't read xref table
Error in poppler_pdf_text(loadfile(pdf), opw, upw) : PDF parsing failure.
Resolved: I followed the following steps and that fixed the issue
pip install poppler
uninstall.packages(pdftools)
install.packages(pdftools, dependencies=TRUE)