I need help to extract information from a pdf file in r (for example https://arxiv.org/pdf/1701.07008.pdf)
I'm using pdftools
, but sometimes pdf_info()
doesn't work and in that case I can't manage to do it automatically with pdf_text()
NB notice that tabulizer didn't work on my PC.
Here is the treatment I'm doing (Sorry you need to save the pdf and do it with your own path):
info <- pdf_info(paste0(path_folder,"/",pdf_path))
title <- c(title,info$keys$Title)
key <- c(key,info$keys$Keywords)
auth <- c(auth,info$keys$Author)
dom <- c(dom,info$keys$Subject)
metadata <- c(metadata,info$metadata)
I would like to get title and abstract most of the time.