0

I have got a problem with using the extract_tables function in tabulizer (N.B. I realise there are potential issues with Tabulizer that mean it has been removed from CRAN and I wonder if this is one of them and wonder if there is an alternative.

Here is some script extracting a table from page 9 of a report for two different years. The first report works but the second doesn't - there has been slight changing of the format oof the report but I can't really see why that would cause it not to work...

With the simplified code above, I am getting this error:

Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, : java.io.IOException: Error: End-of-File, expected line

However, in my main (not easily reproducible) code, it appeared only be scraping the top left code containing "Local authority"

Any ideas at all? Is there a good alternative to tabulizer::extract_tables that I could try instead?

sco.pdf.url.1 <- "https://www.transport.gov.scot/media/46552/decriminalised-parking-enforcement-income-expenditure-annual-report-2018-19.pdf"
  
sco.pdf.url.2 <- "https://www.transport.gov.scot/media/48825/decriminalised-parking-enforcement-income-and-expenditure-2019-20-report.pdf"
  
sco.pdf.i.e.tab <- 9

scotland.tfs.i.e <- extract_tables(sco.pdf.url.1, pages = sco.pdf.i.e.tab, output = "data.frame")[[1]]

scotland.tfs.i.e <- extract_tables(sco.pdf.url.2, pages = sco.pdf.i.e.tab, output = "data.frame")[[1]]
DJD
  • 31
  • 3

0 Answers0