1

I am extracting data from a pdf using a blob trigger python azure function app and I am getting the following error when using tabula py. I was able to run it locally without issues, however, when I deploy the function I am getting the following error:

Result: Failure
Exception: JavaNotFoundError: `java` command is not found from this Python process.Please ensure Java is installed and PATH is set for `java`
Stack:   File "/azure-functions-host/workers/python/3.6/LINUX/X64/azure_functions_worker/dispatcher.py", line 315, in _handle__invocation_request
    self.__run_sync_func, invocation_id, fi.func, args)
  File "/usr/local/lib/python3.6/concurrent/futures/thread.py", line 56, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/azure-functions-host/workers/python/3.6/LINUX/X64/azure_functions_worker/dispatcher.py", line 434, in __run_sync_func
    return func(**params)
  File "/home/site/wwwroot/Assessment/__init__.py", line 21, in main
    pdfTable = tabula.read_pdf(blob_to_read,pages='all',multiple_tables=True)
  File "/home/site/wwwroot/.python_packages/lib/python3.6/site-packages/tabula/io.py", line 322, in read_pdf
    output = _run(java_options, kwargs, path, encoding)
  File "/home/site/wwwroot/.python_packages/lib/python3.6/site-packages/tabula/io.py", line 91, in _run
    raise JavaNotFoundError(JAVA_NOT_FOUND_ERROR)

Here is my code:

import logging
import azure.functions as func

import io
import re
import os
import tabula


def main(myblob: func.InputStream,blobout: func.Out[str],context: func.Context):
    logging.info(f"--- Python blob trigger function processed blob \n"
                 f"----- Name: {myblob.name}\n"
                 f"----- Blob Size: {myblob.length} bytes")

    inputblob = myblob.read()
    blob_to_read = io.BytesIO(inputblob)

    pdfTable = tabula.read_pdf(blob_to_read,pages='all',multiple_tables=True)

I have also tried camelot but ran into complications related to ghostscript installation.

I am on a consumption plan. Any help on how to solve this issues would be greatly appreciated.

Thank you.

SantiASC
  • 13
  • 1
  • 4
  • How about the issue? Does the answer below resolved your question, If not, would you please let me know the latest information about this issue? – Leo Liu May 28 '20 at 09:40

3 Answers3

0

Exception: JavaNotFoundError When Running Tabula-py in a python azure funciton app

Just as the error showing:

Exception: JavaNotFoundError: java command is not found from this Python process.Please ensure Java is installed and PATH is set for java

It state that the Java does not installed or the PATH does not set for Java on your agent.

So, you need to check if your have installed the Java and set the PATH.

If not, you could try to create your private agent How to Set Up a Private Agent in Visual Studio Team Services.

Hope this helps.

Leo Liu
  • 71,098
  • 10
  • 114
  • 135
0

Check the tabula-py requirements, it requires the java8 and suppose your function environment is a python runtime Linux function and it doesn't have Java installed.

So the first choice is you could follow Create a function on Linux using a custom container to create a function with a custom image which includes the runtime you need.

Another potential choice is upload the JVM to azure, and in your function code set the PATH, you could have a try.

George Chen
  • 13,703
  • 2
  • 11
  • 26
-1

Hi I was having the same issue and was not able to find a great answer but have finally got it working, with the trick being to install Java on the Ubuntu server in Azure App Service. To get Java installed on the Ubuntu server you will need to SSH in to the server and then follow the steps below from https://www.digitalocean.com/community/tutorials/how-to-install-java-with-apt-on-ubuntu-18-04. I found the sudo command was not recognised but managed to get it working without needing it (i.e. don't need to include it in the commands).

How to install Java on Ubuntu server on app service

nicbowman
  • 1
  • 1