0

I am trying to collect data from a pdf using the R tabulizer package. However then I got an error when I try to convert the data to a dataframe and export it to CSV. My code is below. Could someone help me with this?

# Library packages

if (!require(Rtools)) {         
  install.packages("Rtools", dep=TRUE)      
  library(Rtools)
}

if (!require(rJava)) {         
  install.packages("rJava", dep=TRUE)      
  library(rJava)
}

if (!require(tabulizer)) {         
  install.packages("tabulizer", dep=TRUE)      
  library(tabulizer)
}

rm(list = ls())
setwd("MyPath")

site <- "https://cptnacional.org.br/component/jdownloads/send/36-conflitos-por-terra-ocorrencias/14151-conflitos-por-terra-ocorrencias-2018?Itemid=0"


# default call with no parameters changed
matrix_results <- extract_tables(site)

# get back the tables as data frames, keeping their headers
df_results <- extract_tables(site, output = "data.frame")

first_df <- df_results[[1]]
View(first_df)

text <- extract_text(site)

# print text
cat(text)


write.csv(text, file = "test.csv")
zx8754
  • 52,746
  • 12
  • 114
  • 209
  • There's a lot of manual cleaning required when converting tables directly from pdf. Can't really say for sure what is the error you are having without actually looking at the table you're trying to get. – Adam Quek Aug 19 '19 at 01:23
  • Are the data in this pdf "https://cptnacional.org.br/component/jdownloads/send/36-conflitos-por-terra-ocorrencias/14151-conflitos-por-terra-ocorrencias-2018?Itemid=0" – Caíque Melo Aug 19 '19 at 01:35
  • Your PDF is 59 pages long. It will be much easier for people to help you if you try to narrow down where you run into trouble. What's the error? – camille Aug 19 '19 at 04:14

0 Answers0