I have been attempting to extract a pdf with Python after a tool was created to extract it using java and pdfbox.
While the Java implementation was successful for the same pdf, I have been struggling to do the same in python since both pdfminer and pypdf, and pypdf2 have not be able to extract the pdf line by line with spaces. In particular, pdfminer pdf2txt for some bizarre reason split the pdf in 3 columns and then read line by line.
The closest I've gotten was using the implementation of a stack overflow question which unfortunately does not keep the spaces. Given that I have variables that both have numbers, I am being unable to recover them in text form.
Given this, is it possible to extract a pdf with its white spaces in Python line by line?