0

I have been testing my code a few times and it worked well every time, but now for some reason it raises a weird error that I will right down just after. I am using tabula to read some pdf file, here is the code where it appears there is an error :

for it_page,page in enumerate(pages_id, start=0):
    print("page : ", page)
    tables = tabula.read_pdf(hermes_pdf_dir + "/" + pdf_name, pages = page)
    
    
    for i,table in enumerate(tables, start=1):
        print( "titre retenu : " + pages_id_titres[it_page][1] + f"_{i}.xlsx")
        table.to_excel(os.path.join(folder_name, pages_id_titres[it_page][1]  + " p" + str(page) + f"_{i}.xlsx"), index=False)

The error is at the line beginning with "tables = tabula.read_pdf(...)".

Most importantly, here is the full error message :

Traceback (most recent call last):
  File "get_pdfs_hermes.py", line 299, in <module>
    read_pdf_download_csv(pdf_name2)
  File "get_pdfs_hermes.py", line 199, in read_pdf_download_csv
    tables = tabula.read_pdf(hermes_pdf_dir + "/" + pdf_name, pages = page)
  File "C:\Users\virgi\Python\lib\site-packages\tabula\io.py", line 322, in read_pdf
    output = _run(java_options, kwargs, path, encoding)
  File "C:\Users\virgi\Python\lib\site-packages\tabula\io.py", line 80, in _run
    result = subprocess.run(
  File "C:\Users\virgi\Python\lib\subprocess.py", line 512, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['java', '-Dfile.encoding=UTF8', '-jar', 'C:\\Users\\virgi\\Python\\lib\\site-packages\\tabula\\tabula-1.0.4-jar-with-dependencies.jar', '--pages', '104', '--guess', '--format', 'JSON', 'C:\\Users\\virgi\\Desktop\\virgile_stuff\\prog\\banking analyst\\financial_data/data/hermes_data/hermes_2014_rapportannuel_en.pdf']' returned non-zero exit status 1.

It talks about java dependencies (maybe because tabula has tabula-py and tabula-java ?) and the most related issues I found regarding this kind of errors say that java should be updated, while I have the very latest version on my computer. Any ideas of what it could be ?

  • I can't see where it talks about Java needing to be installed, or java dependencies. Please share them. (The problem could simply be that the wrong version of the java command is being found via the $PATH that your python app is using.) – Stephen C Mar 05 '21 at 08:46
  • Stephen C : Indeed, my thoughts about updating java are related to what I had found online concerning my issue, but nothing is clearly written about that in the error message. Now for the java path, I checked and it is well-linked to java.exe who has been updated late january 2021, so I assume it corresponds to the latest java version (maybe I'm missing something there?). In any case, thanks for your answer Stephen. – Virgile BRIAN Mar 06 '21 at 08:05

1 Answers1

0

By simply making an exception for the pdf file that was tackled while the error occured, it seems to work well again. I think the issue comes from the page encodage or something like that.