Tabula-py font not emplemented error

Question

The PDF file content is Chinese(characters, not pictures and so on), so the it may use different fonts. My code:

>>> import tabula
>>> df = tabula.read_pdf('/data/proj/smartinvestment/cninfo_download_reports/pdf/601101/2016-12-29/1202969937.PDF', pages='all')

The Error:

Feb 02, 2018 6:44:34 PM org.apache.pdfbox.pdmodel.font.PDCIDFontType2 <init>
INFO: OpenType Layout tables used in font ABCDEE+ËÎÌå are not implemented in PDFBox and will be ignored

The final DataFrame is empty.

I can not find any idea from stackoverflow. How can I fix the issue? should I import some fonts or, there is any other reason caused this error?

The "OpenType Layout" message is irrelevant, it is for PDF creation. You should share the PDF. — Tilman Hausherr, Feb 02 '18 at 12:25

score 1 · Answer 1 · answered Aug 29 '19 at 23:25

I feel your pain. However, I am getting data in my dataframe (df) doing similar steps to yours. To troubleshoot, look at the type of your df being returned:

import tabula

pdf_file_name = "my_filename.pdf"
df = tabula.read_pdf(pdf_file_name,
                     encoding='Ansi') # or encoding='utf-8'

print(type(df))
# df.to_csv("output.csv", index=False)

It is quite possible that, due to you having pages="all", your df is a list of df's, which would require you to look into each df in the list to see evidence of your data.

Also, if the multiple_tables parameter for tabula.read_pdf is set to True, df would be a list of df's, and, again, this would also require you to look into each df in the list to see your data.

Tabula-py font not emplemented error

1 Answers1