0

Camelot treats some rows as separate when actually they are not. The result is rows that should have belonged to the previous row.

I'm working with Camelot to extract data from bank statements. The problem is that Camelot treats some rows as separate when actually they are not. ? As you can see in the image attached, the transaction on 1/9/2019 is split into 3 rows when actually it's only one. This happens when the description is more than one row (original statement attached).

I tried optimizing row_tol and col_tol with no success. Any solution within Camelot? If not, what would be a quick fix in PANDAS?

tables = camelot.read_pdf('BOA1.pdf',flavor='stream', flag_size=True)
tables

enter image description here

enter image description here

Almog Woldenberg
  • 481
  • 1
  • 4
  • 9
  • 2
    Unrelated suggestion : As a person who used to worked in the finance industry.It is better do not show those confidential information at all, knowing your problem better and create a sample dataframe here. – BENY Apr 05 '19 at 21:36
  • 3
    From experience, I think it is difficult to get the result you want with flavor='stream' (although 'stream' is the right flavor to extract this table). In Pandas you could think of a way to merge some rows if the date cell is empty... – Stefano Fiorucci - anakin87 Apr 09 '19 at 12:55

0 Answers0