I parsed 3 documents to fetch tables. The results as follow:
- Document 1: Perfect parsing.
- Document 2: got Jul 16, 2019 5:25:42 PM org.apache.pdfbox.pdmodel.font.PDType1Font WARNING: Using fallback font NimbusSanL-Bold for Univers-Bold Not sure if this is related but the second page was parsed and the first one was not.
- Document 3: Got Jul 17, 2019 10:21:25 AM org.apache.pdfbox.pdmodel.font.PDType1Font WARNING: Using fallback font NimbusSanL-Regu for Univers. Nothing was parsed from this one.
These are the current tabula parsing settings:
rows = tabula.read_pdf(filename,
pages='all',
silent=True,
pandas_options={
'header': None,
'error_bad_lines': False,
'warn_bad_lines': False
})
Are there other settings that might solve this particular problem.