I am tring to convert PDF to text file. But the characters which are unique to the language of the PDF file, I mean the ones not in English, are lost in conversion.
How can I convert PDF to text without losing any data?
Thanks in advance.
import PyPDF2
filename = 'Book1.pdf'
pdfFileObj = open(filename,'rb')
pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
str = ""
for i in range(pdfReader.getNumPages()):
str += pdfReader.getPage(i).extractText()
with open("text2.txt", 'w', encoding = 'utf-8') as f:
f.write(str)