-1

I am working on extracting tables from PDF and I am writing that in a csv file. When I executed the code, the tables were not properly written in the csv file.

Here is my code:

library(tabulizer)

location <- 'http://keic.mica-pps.net/wwwisis/ET_Annual_Reports/Religare_Enterprises_Ltd/RELIGARE-2017-2018.pdf')

out <- extract_tables(location)

for(i in 1:length(out)) {
    write.table(out[i], file='Output.csv',append=TRUE, sep=",",quote = FALSE)
}

I enclosed the screenshot of the output file. In that you can see the tables are incomplete. Screenshot of csv file

Any help would be appreciated.

stefan
  • 90,330
  • 6
  • 25
  • 51
Sri Priya
  • 21
  • 4
  • Here's another SO Q&A with an alternate approach to extraction after Tabula failures: https://stackoverflow.com/questions/67489987/pdf-scraping-get-company-and-subsidiaries-tables/67658530#67658530 – IRTFM Jun 21 '21 at 22:33

2 Answers2

0

Dealing with pdfs can be very hard and very specific to the files you have at hand. You will probably need to do lots of tweaking to get the data in a usable format.

Have a look here for an example (https://github.com/b-rodrigues/stats_historiques/blob/master/stats_historiques.R) and the results (https://twitter.com/brodriguesco/status/1405995811863945223)

Marcelo Avila
  • 2,314
  • 1
  • 14
  • 22
0

I saw this on the internet:

tweets.df <- do.call("rbind", lapply(a, as.data.frame)) 

So it is better to first have a dataframe and then use write.csv().

Ethan
  • 876
  • 8
  • 18
  • 34