0

I have been using camelot for extracting tables from PDF pages. It works well. However, it takes around 5 minutes to extract all the tables from a pdf of 68 pages. In future, I am going to need to extract tables from pdf with over a 1000 pages. I presume it will take a lot of time doing that.

Is there a way that we can make camelot process faster or any suitable alternative?

Yaset Arfat
  • 106
  • 1
  • The only out-of-the-box solution that comes to mind is to divide the extraction into several parallel extractions, based on the page number. Obviously, to do this, you should have enough computational resources (CPU and RAM). – Stefano Fiorucci - anakin87 Jun 16 '21 at 07:11
  • Yes, that is exactly what I am doing right now. But thought if someone might have a way to make it faster. Do you know any alternative to Camelot? (apart from Tabula) – Yaset Arfat Jun 17 '21 at 07:35
  • Here you can see other similar tools, compared with Camelot: https://github.com/camelot-dev/camelot/wiki/Comparison-with-other-PDF-Table-Extraction-libraries-and-tools In any case, Camelot seems better. – Stefano Fiorucci - anakin87 Jun 17 '21 at 08:19

0 Answers0