0

Recently I tried using tabula to parse a table in the pdf that contains no lines within each fields of the table.

This results in a creation of a list that combines all the different fields into one (example of output).

How do i convert this single string into a dataframe so i can manipulate the numbers? Thank you very much

1 Answers1

0

There is no dummy file given in the question to test, but if there is no separation line in between columns of the pdf table, and the table is merging in one column after extracting from tabula, try to use parameter 'columns' in tabula.read_pdf.

According to Tabula Documentation, this parameter works like this:

columns (list, optional) –
X coordinates of column boundaries.

So, if the format of the PDF is same for every PDF, you can find X coordinates of columns from which you want to separate the data. For that you can use any PDF tool like Adobe, or you can hit and trial also.

Still doubt, please attach dummy PDF so one can look into it.