Questions tagged [pdftools]

An R package for Text Extraction, Rendering and Converting of PDF Documents

Utilities based on 'libpoppler' for extracting text, fonts, attachments and metadata from a PDF file. Also supports high quality rendering of PDF documents into PNG, JPEG, TIFF format, or into raw bitmap vectors for further processing in R.

97 questions
0
votes
1 answer

Difficult installing R package pdftools

I am trying to install R package “pdftools” and encounter the following: In file included from libqpdf/Pl_DCT.cc:1: include/qpdf/Pl_DCT.hh:27:10: fatal error: 'jpeglib.h' file not found #include ^ 1 error generated. make: ***…
marcel
  • 389
  • 1
  • 8
  • 21
0
votes
2 answers

Extracting multiple phrases from multiple PDF's simultaneously using R

I have a list of pdf pathways in one table, and I am trying to repeat the commands below for the rest of the pdf's listed. Basically I am converting the pdf file to text for the file's first page only and then using the keyword_search command to run…
Siren
  • 1
  • 1
0
votes
0 answers

Extract data from PDF to CSV using R

I was using this code to extract data from my PDF: tx <- pdf_text("Name.pdf") tx2 <- unlist(str_split(tx, "[\\r\\n]+")) tx3 <- str_split_fixed(str_trim(tx2), "\\s{2,}", 5) write.csv(tx3, file="Path\\ds1.csv") But this uses End of line to separate…
0
votes
1 answer

How to mutate a large number of columns onto a data frame at once in R using a custom function with pdftools and html links?

Sorry if this is long or not structured correctly, its my first question and first major R side project! Let me know if I should change anything about my questions for the future. I am currently working with some city traffic data that is stored…
-1
votes
1 answer

Append values from a data frame to a list created in for loop

*Edit: Thanks to Martin and a little bit of time and attention, I was able to get the code where I needed it to be. Is it ugly? Yes, but it works in way that's useful to me now. Any tips on how to clean this up and make it more efficient would be…
James R.
  • 91
  • 9
-1
votes
2 answers

PDF conversion to CSV R

I am trying to load the following PDF into R, and convert the table into a CSV file. I have tried both the library(pdftools) and library(tabulizer), & I have spent an afternoon going through various forums, but I do not seem to find an answer that…
Andy
  • 413
  • 2
  • 15
-2
votes
1 answer

How to capture files with the same name only with the .pdf extension

enter image description here I'm using R, because I need to capture files with the same name only with the .pdf extension See the attached image. The file with the extension in excel doesn't interest me. The files have similar names I tried…
1 2 3 4 5 6
7