1

I'm not sure if this behaviour's normal, but there is some inconsistency while reading the pdf.

A oneliner: pdf = tabula.read_pdf(path, pages=pages)

Where path is the directory of the pdf file. When printing the pdf in the console some values like e.g. Ceiling materials, the next line parsed has whitespaces like e.g. C eiling ma terials.

Here is a picture:

enter image description here

The same happens to a series of numbers, they're imported with whitespaces too.

Does anyone know why this is the case? And possibly how to avoid the redundant whitespaces?

I mean, it doesn't make sense that one line is parsed perfectly fine whilst the other's not. Bug?

user13581602
  • 105
  • 1
  • 9

0 Answers0