Python3: tabula-py imports several strings with random whitespaces

Asked Apr 07 '21 at 20:28

Active Apr 07 '21 at 20:28

Viewed 146 times

I'm not sure if this behaviour's normal, but there is some inconsistency while reading the pdf.

A oneliner: pdf = tabula.read_pdf(path, pages=pages)

Where path is the directory of the pdf file. When printing the pdf in the console some values like e.g. Ceiling materials, the next line parsed has whitespaces like e.g. C eiling ma terials.

Here is a picture:

The same happens to a series of numbers, they're imported with whitespaces too.

Does anyone know why this is the case? And possibly how to avoid the redundant whitespaces?

I mean, it doesn't make sense that one line is parsed perfectly fine whilst the other's not. Bug?

asked Apr 07 '21 at 20:28

user13581602

NB: I'm using tabula-py 2.2.0 and Python 3.8.1 – user13581602 Apr 07 '21 at 20:31

Python3: tabula-py imports several strings with random whitespaces

0 Answers0