how to read pdf by blocks rather by lines in R using "pdftools"?

Asked Aug 10 '23 at 16:13

Active Aug 10 '23 at 16:13

Viewed 17 times

With {pdftools} package, we can read pdf into R environment. But it reads by lines rather by blocks. So, when there are multiple columns, the result becomes a mess.

For example, we like to have it in this way.

but it comes in this way

Have tried to read it into data.table, and split it into 2 columns. But the attempt failed, since the code can't tell if the single space is for the words or the columns.

Please advise.

asked Aug 10 '23 at 16:13

Grec001

1,111
6
20

I'd found a post about this issue. Please find it here. https://stackoverflow.com/questions/72229791/scraping-two-column-pdf – Grec001 Aug 11 '23 at 03:35
the concept was easy that it uses long spaces as the separator. it's rigid however it seemed the only option we have for now. – Grec001 Aug 13 '23 at 05:13

how to read pdf by blocks rather by lines in R using "pdftools"?

0 Answers0