Highest Voted 'pdftools' Questions

0

votes

1 answer

How to change tesseract's Page Segmentation Method (PSM) using R?

I would like to read a scanned PDF document into R using tesseract. In general, this already works quite well, but I have problems when the documents have a table structure. After some time of research I found out that there is a parameter to set…

asked Nov 05 '21 at 14:03

RKF

131
7

0

votes

1 answer

Do I need to use RSelenium to download these PDFs?

I am trying to use rvest and pdftools to go through this page and download the PDFs. I'm having trouble using CSS selector to do this, and wondering if this might take a webdriver? Also, is it easy enough to use a webdriver to do this in R - as a…

r selenium rvest rselenium pdftools

asked Oct 04 '21 at 17:41

paulson

3
1

0

votes

1 answer

How do I combine some vector elements in the same vector using r?

I extracted table from pdf using pdftools in r. The table in PDF has multi-line texts for the columns. I replaced the spaces with more than 2 spaces with "|" so that it's easier. But the problem I'm running into is that because of the multi-line and…

r pdftools

asked Aug 25 '21 at 19:10

user1828605

1,723
1
24
63

0

votes

1 answer

How to systematically extract data from a textbook

{edited} Hi everyone! I'm attempting to systematically extract data from a textbook (pdf). Because this task doesn't easily translate to reproducible example, I'm providing 2 pages from the book as an example here. These two pages contain a list of…

r data-mining stringr regular-language pdftools

asked Jul 21 '21 at 17:49

J. Alex Baecher

45
5

0

votes

1 answer

How to convert all pages of a pdf into a single page pdf document in R

I have tried exploring pdftools. It does have a pdf_combine() function which stitches multiple pdf to one. However, It doesn't help combine multiple pages of a pdf document into one page.

r pdf pdftools

asked Jul 14 '21 at 17:59

Ravi Shankar Hela

93
1
11

0

votes

0 answers

How to group and aggregate a data.table based on a range of a variable in r

I have this output from the pdftools pdf_data() for a page of the financial statements of a town. Unfortunately, in rare cases, the capture of a line y is slightly off, as shown below. I would like to be able to group on y including cases where y is…

r data.table pdftools

asked May 20 '21 at 16:16

David Lucey

252
3
9

0

votes

0 answers

How to save .pdf file with correct filename if specific characters is used in pdftools::pdf_subset(), R

I hope someone can help me. I use pdf_subset() from pdftools package to select some pages from .pdf file and save in new .pdf file. However, there is a problem: my path/filename consists of specific characters (polish letters) which are replaced by…

r encoding special-characters polish pdftools

asked May 19 '21 at 13:10

Vasyl Mohytych

93
8

0

votes

1 answer

Reading PDF portfolio in R

Is it possible to read/convert PDF portfolios in R? I usually use pdftools, however, I get an error: library(pdftools) #> Using poppler version 0.73.0 link <-…

r pdf pdftools

asked May 06 '21 at 00:12

ava

840
5
19

0

votes

3 answers

creating a loop for "load" and "save" processes

I have a data.frame (dim: 100 x 1) containing a list of url links, each url looks something like this: https:blah-blah-blah.com/item/123/index.do . The list (the list is a data.frame called my_list with 100 rows and a single column named col and is…

r for-loop pdftools

asked Apr 09 '21 at 19:28

stats_noob

5,401
4
27
83

0

votes

1 answer

How to replace if a value of the column if it starts with character "N" in R

How to replace if a value of the column (GID) starts with char "N" to ColB if the ColB is empty in a Dataframe in R programming code: DataFile <- extract_tables("new.pdf",pages = c(87), method = "stream", output =…

r tabulizer pdftools

asked Feb 04 '21 at 07:58

kumar

5
5

0

votes

0 answers

How to merge specific columns with its next column without hardcoding in R programming

How to merge column names that are "X" with its next column without hardcode in R programming X should be merged to Day.7 X.1 should be merged into Day.8 X.2 and X.3 should be merged into Day.9 Code: library(data.table) library(tabulizer) pdf_file…

r tabulizer pdftools

asked Jan 25 '21 at 07:26

kumar

5
5

0

votes

2 answers

How to remove column labels if the name of the label starts with "G" in R programming

How to remove column labels if the name of the label starts with "G" code: library(pdftools) library(data.table) library(tabulizer) pdf_file <- "new.pdf" out2 <- extract_tables(pdf_file, pages =c(89), output =…

r dataframe pdftools tabulizer

asked Jan 22 '21 at 15:10

kumar

5
5

0

votes

1 answer

how to rename of a column header as per the next column in R programming

How to rename column headers that have "X or X.1 or X.3" values, but it should refer and rename with the next column's header. code: library(pdftools) library(data.table) library(tabulizer) pdf_file <- "new.pdf" out2 <- extract_tables(pdf_file,…

r dplyr pdftools tabulizer

asked Jan 21 '21 at 12:45

kumar

5
5

0

votes

1 answer

Scraping PDF in R with Nested Information

I am attempting to scrape a rather difficult PDF in R using both pdftools::pdf_text and tabulizer::extract_tables. However, in my situation, neither of these seems to be too helpful based on the nature of the PDF. The PDF contains "nested"…

r pdf pdf-scraping pdftools tabulizer

asked Jan 20 '21 at 20:05

mikeytop

150
9

0

votes

1 answer

Ways to extract images from pdf using R

Is there a way to extract images from pdf using R and save them into a folder? there are a lot of similar questions regarding other programming languages and there is apparently a way to do this in python, was wondering if the same work can be…

r pdf pdftools

asked Nov 13 '20 at 18:06

Bahi8482

489
5
15

Questions tagged [pdftools]