I am trying to extract tables from a pdf files, after trying with multiple different packages, tabula is the best one to extract the tables from my pdf file correctly. The thing is that, for each table, there is a title for it above the table (not included in the table part).
import tabula.io as tb
from tabula.io import read_pdf
file_path = ""
tables = tb.read_pdf(file_path, pages = "1")
I would like to extract the title with to each table as well, I tried using other packages, but they will also extract some text from table that I couldn't differentiate the text is inside table or outside.
*I have tried camelot as well, I know it can extract text from whole page, but this one would mess up my table format.
I would like to know if there is any way that I can extract text only outside table, or any suggestion that I can extract table and title at the same time?
Thanks!
Reference table image got from: image got from https://pspdfkit.com/guides/ios/customizing-the-interface/changing-the-document-title/