I'm not sure if this behaviour's normal, but there is some inconsistency while reading the pdf.
A oneliner: pdf = tabula.read_pdf(path, pages=pages)
Where path
is the directory of the pdf file. When printing the pdf
in the console some values like e.g. Ceiling materials, the next line parsed has whitespaces like e.g. C eiling ma terials.
Here is a picture:
The same happens to a series of numbers, they're imported with whitespaces too.
Does anyone know why this is the case? And possibly how to avoid the redundant whitespaces?
I mean, it doesn't make sense that one line is parsed perfectly fine whilst the other's not. Bug?