1

I'm using Tabulizer 0.2.2 extract_tables on the following pdf in R on Mac.

sales <- "http://www.greenwichct.org/upload/medialibrary/5cd/Residential-Sales-by-Address-10-10-to-10-15.pdf"

test <- extract_tables(sales,pages=c(1:10),method="decide")

I believe the tables on each page are the same, but resulting list of matrices for the first ten pages for example gives matrices with 3 different dimensions. Columns are sometimes concatenated, for example in test[[3]] with columns 1 and 2.

I have tried setting the area, specifying methods. I have looked around for how to specify the column parameter, but cannot find anything specific. Even went through with extract_area(), but same result. Same problems using Tabula app also.

Any thoughts appreciated.

zx8754
  • 52,746
  • 12
  • 114
  • 209
David Lucey
  • 252
  • 3
  • 9
  • The link to the PDF is not valid anymore. Could you provide a link to a new PDF? – Emmanuel Hamel Sep 15 '22 at 22:26
  • This was a messy pdf 4 years ago, if it is gone, there hopefully never be another one like it. If you are trying to learn how to use tabulizer, there must be better options. – David Lucey Oct 11 '22 at 06:47
  • I am not trying to learn how to use tabulizer. I was just checking if I could help to answer your question ;) – Emmanuel Hamel Oct 11 '22 at 10:44

0 Answers0