Not detecting columns

Question

I was parsing bank statement using tabula-py in which columns are seperated by vertical margins but row are not separated. so i use stream mode but if in any page there is not entry for any column then tabula merges them as one for code

tables=tabula.read_pdf("pdfname.pdf",pages='all')

So i use columns option to manually select columns

tables=tabula.read_pdf("pdfname.pdf",pages='all',columns= ['27.0,68.0,272.0,357.5,397.0,474.5,553.0,631.0'])

but it does nothing like tabula is not even reading the options output is same as previous Sorry i can not post the table for privacy purposes.

[my tables is somewhat like it you can check image at https://i.stack.imgur.com/f40V0.png]

Manuel Aristarán · Answer 1 · 2019-06-29T19:06:43.550

0

The columns keyword argument should be an array of numbers:

tables = tabula.read_pdf("pdfname.pdf",
                         pages='all',
                         columns=[27.0,68.0,272.0,357.5,397.0,474.5,553.0,631.0])

edited Jun 29 '19 at 19:06

answered Jun 29 '19 at 18:51

Manuel Aristarán

524
3
13

Does it work with the Tabula application? You can get it at https://tabula.technology – Manuel Aristarán Jul 01 '19 at 00:47
No, Tabula application also merging the columns. – Ayush Bansal Jul 01 '19 at 07:52
Have been having having this problem, am sorry to say that whether `columns` is a list (square brackets) or a tuple (round brackets) makes no difference. – cardamom Apr 06 '20 at 08:24

score 0 · Answer 2 · edited Jul 10 '19 at 13:20

0

As far as I know, tabula-py is just a wrapper of tabula-java, so the extraction accuracy is the same as tabula app. Try PDFplumber instead.

edited Jul 10 '19 at 13:20

blackbrandt

2,010
1
15
32

answered Jul 10 '19 at 12:10

chezou

486
4
12

Not detecting columns

2 Answers2