So I spent some time trying to extract data using PyPDF2
but this ended up being unreliable across pdfs even if the pdfs looked (to the eye) like they had similar structure and are probably computer generated.
The thing I liked about PyPDF2 is that it goes through the pdf file and pulls in the text from the various objects so you don't have to deal with spacing etc between characters (as far as I can understand) extractText
PyPDF2 function.
Camelot on the other hand according to the docs uses pdfminer which as far as I understand doesn't do the above but tries to group different parts of the pdf together from characters into words into lines depending on distancing rules. The problem I experienced with Camelot is that you get results like "He l lo Wo rld".
Unfortunately I can't share a pdf example online
Let me know what other information would be helpful to share