The PDF file content is Chinese(characters, not pictures and so on), so the it may use different fonts. My code:
>>> import tabula
>>> df = tabula.read_pdf('/data/proj/smartinvestment/cninfo_download_reports/pdf/601101/2016-12-29/1202969937.PDF', pages='all')
The Error:
Feb 02, 2018 6:44:34 PM org.apache.pdfbox.pdmodel.font.PDCIDFontType2 <init>
INFO: OpenType Layout tables used in font ABCDEE+ËÎÌå are not implemented in PDFBox and will be ignored
The final DataFrame is empty.
I can not find any idea from stackoverflow. How can I fix the issue? should I import some fonts or, there is any other reason caused this error?