I'm using pdftools to import text into R from a pdf, and readr to read it in line by line. It works for the first page but stops there.
It seems like it would be so simple to read in all pages of a document and yet I get the same result with several different documents. Going by the example code, is there a step I'm missing?
install.packages("pdftools")
install.packages("readr")
library(pdftools)
library(readr)
download.file("http://www.africau.edu/images/default/sample.pdf",
"sample.pdf")
sample <- pdf_text("sample.pdf")
sample <- read_lines(sample)
print(sample)
It might be relevant to add, running the read_lines command gives a warning: "running the read_lines command gives the following:
"Warning message:
In if (grepl("\n", file)) { :
the condition has length > 1 and only the first element will be used""