0

I need to export the table from the pdf and select the particular columns. I have managed to export by the "tabulate","tabula", however it is not exporting in the proper format. In the original file, there are 5 columns, but after exporting I get 3 columns totally because first three columns are considered as one for some reason. enter image description here

Here is the table of the original table: enter image description here

Below is my code with the output:

enter image description here

enter image description here

YIF99
  • 51
  • 1
  • 8
  • Thanks for the comment. Exactly the column that is taken as a one originally consist of 3 columns, so just thinking how to separate them. – YIF99 Nov 10 '21 at 11:07
  • Please provide enough code so others can better understand or reproduce the problem. – Community Nov 11 '21 at 05:09
  • imported packages (tabulate, pandas, tabulate.io) file= "name of file" dfs = read_pdf(file, pages="all", pandas_options={'header':None}) dfs[0].columns = ["Structure ","Latitude","Longitude"] lat = dfs[0][3:] lat.iloc[0]["Longitude"] – YIF99 Nov 11 '21 at 10:03
  • Reason why i added 3 columns instead of 5, because it does not run I other way. I got value: '16° 03’ 53.80628”' from one columns. Now trying to export it as decimal. – YIF99 Nov 11 '21 at 10:05

1 Answers1

0

Try this:

dfs = read_pdf(file, pages="all",  pandas_options={'header':None})
Wilian
  • 1,247
  • 4
  • 11
  • Thank you very much, now i get tables without column names, which I can assign manually. The point I can not understand Is that why it creates table with 3 columns not 5 as it is in the original table – YIF99 Nov 10 '21 at 10:54
  • Maybe because of the merged cells. I would have to test it with your pdf to be sure. – Wilian Nov 10 '21 at 10:58
  • Thank you for your kind support. I am so sorry that , I do not have a right for sharing file. – YIF99 Nov 10 '21 at 11:27