Attempting to read the daily works of a Parliament, I discovered the documents are splintered into many PDF documents which cannot be simply opened by the browser to read and must be downloaded individually. My basic idea is to download all the docs and extract the titles of all the decisions taken
Previous threads suggest using PyPDF2. Apparently this does not work at all in my case. The characters in the PDF are greek letters so perhaps the encoding has something to do with it. On top of that, at the end of the document, there are some pictures added (which are of no interest to me).
Is there any chance PyPDF2 can pull this off or should I look elsewhere?