Is there a way to read the first page of a PDF document from a URL without saving it locally? I need to read a request for a PDF document on the website. In the following, you will find the code that I tried to execute. The code works well with some http URLs but not with others.
import urllib3
urllib3.disable_warnings()
with urllib3.PoolManager() as http:
r = http.request('GET', url)
with io.BytesIO(r.data) as f:
reader = PyPDF2.PdfFileReader(f)
contents = reader.getPage(0).extractText().split('\n')
Here is the output when I run this code with the following url: "http://www.ain.gouv.fr/IMG/pdf/aprejetdae20210709enligne.pdf"
['', '', '', '', '˘ˇˆ', '˙˝', '˚', '!˛', '˛ ', '', 'ˆ˙ˆ#$%', '$', "#'˙", '( ', '', '', '', '˘ˇˆˇ˙', '˝˘ˇˆˇ˛˚', '˜', ' !"ˇ#ˆ"!$%!"ˇ#&', "ˇ'", '˜', '(', '!"ˇ#ˆˇ!$%!"ˇ#)&*', '˜', '((ˇˇ%!"!ˇ', '+,+-', '(./', '01(', '!,(2$˙', '""˚345', '6', '7((&(1(8', '1ˆ(1((˛.', '˜', '$!"!ˇ(1*(1', '1', ',1˝/9,', '/1(', '˜', '\'%!"!ˇ(1(1', '1,1˝/9,(6', '˜', ')%:(()', '˜', '+,+-(.$!"!ˇ()', '˜', '(!˙%!"!ˇ()5,5,((', '=( ...
Python version : Python 3.10.0