Exporting Table from a pdf file

Question

I need to export the table from the pdf and select the particular columns. I have managed to export by the "tabulate","tabula", however it is not exporting in the proper format. In the original file, there are 5 columns, but after exporting I get 3 columns totally because first three columns are considered as one for some reason. enter image description here

Here is the table of the original table: enter image description here

Below is my code with the output:

enter image description here

Thanks for the comment. Exactly the column that is taken as a one originally consist of 3 columns, so just thinking how to separate them. — YIF99, Nov 10 '21 at 11:07
Please provide enough code so others can better understand or reproduce the problem. — Community, Nov 11 '21 at 05:09
imported packages (tabulate, pandas, tabulate.io) file= "name of file" dfs = read_pdf(file, pages="all", pandas_options={'header':None}) dfs[0].columns = ["Structure ","Latitude","Longitude"] lat = dfs[0][3:] lat.iloc[0]["Longitude"] — YIF99, Nov 11 '21 at 10:03
Reason why i added 3 columns instead of 5, because it does not run I other way. I got value: '16° 03’ 53.80628”' from one columns. Now trying to export it as decimal. — YIF99, Nov 11 '21 at 10:05

score 0 · Answer 1 · answered Nov 09 '21 at 18:03

0

Try this:

dfs = read_pdf(file, pages="all",  pandas_options={'header':None})

answered Nov 09 '21 at 18:03

Wilian

1,247
4
11

Thank you very much, now i get tables without column names, which I can assign manually. The point I can not understand Is that why it creates table with 3 columns not 5 as it is in the original table – YIF99 Nov 10 '21 at 10:54
Maybe because of the merged cells. I would have to test it with your pdf to be sure. – Wilian Nov 10 '21 at 10:58
Thank you for your kind support. I am so sorry that , I do not have a right for sharing file. – YIF99 Nov 10 '21 at 11:27

Exporting Table from a pdf file

1 Answers1