Can anyone give me quick answer/help that as we are facing some issue after pdf extraction to json using python camelot is not giving exact content. some content is missing after extraction.
Asked
Active
Viewed 328 times
0
-
Please post the PDF and what you tried. – Stefano Fiorucci - anakin87 Sep 24 '20 at 12:30
-
https://www.dropbox.com/s/vernt20ntt1z8rt/essart_wochenpla_zwei%20Scheibenhaus%20%281%29.pdf?dl=0 – Goutam Ghosh Sep 24 '20 at 14:02
1 Answers
0
I tried the following code:
import camelot
pdf_path = '/YOUR/FILEPATH.pdf'
tables = camelot.read_pdf(pdf_path, flavor='stream')
Here are two problems:
- headers font is not properly read, so you find strange characters like
(cid:71)
... - using
flavor='lattice'
, the table isn't detected. Usingflavor='stream'
, the table is detected, but the cells aren't properly detected.
At the moment, I think that Camelot can't properly extract this table. They are working on fixing the second problem (see this and this).

Stefano Fiorucci - anakin87
- 3,143
- 7
- 26
-
-
I am sorry that this problem can't be solved using Camelot. If my answer Is useful, please mark It as accepted and upvote It. – Stefano Fiorucci - anakin87 Sep 24 '20 at 22:45
-
-
But extracttable.com is for image to other format. We need pdf to json. – Goutam Ghosh Sep 30 '20 at 11:49