1

I extracted all the text from pdf using tabula and it is great but as my pdf has border less tables and in some rows only single column is present with width of 3 columns, tabula put all text into single column.

let me explain via some example. I highlighted the line in image using blue arrow. If I remove this line all the text is extracted in 3 columns but now it is extracting in single column.

Here is example

from tabula import read_pdf
from tabulate import tabulate
 
#reads table from pdf file
df = read_pdf("abc.pdf",pages="all", guess=False, lattice=False, stream=True, multiple_tables=True) 
print(tabulate(df))

Is there any way to fix tabula configuration or suggest me any other library or tool which can help me.

0 Answers0