Read special characters and fonts from PDF using Python

Asked May 22 '18 at 10:45

Active May 22 '18 at 10:52

Viewed 988 times

I've a PDF in which certain table rows contain special characters and fonts for e.g.. Is there any way to read those properly.

from tabula import read_pdf

df = read_pdf("Tables PDF.pdf", pages = '5', lattice = True, multiple_tables = True, encoding = 'utf-8-sig')

I've tried several types of encodings utf-8, ascii, utf-8-sig, ISO-8859-1. Let me know if there is any other way out.

Also tried reading one of the value separately and make the changes, by using:

df1.iloc[3, 6] = df1.iloc[3, 6].encode("utf-8", "replace")

Didn't work out. Any help will be appreciated.

edited May 22 '18 at 10:52

asked May 22 '18 at 10:45

PratikSharma

0 Answers0