I am extracting data from a pdf using a blob trigger python azure function app and I am getting the following error when using tabula py. I was able to run it locally without issues, however, when I deploy the function I am getting the following error:
Result: Failure
Exception: JavaNotFoundError: `java` command is not found from this Python process.Please ensure Java is installed and PATH is set for `java`
Stack: File "/azure-functions-host/workers/python/3.6/LINUX/X64/azure_functions_worker/dispatcher.py", line 315, in _handle__invocation_request
self.__run_sync_func, invocation_id, fi.func, args)
File "/usr/local/lib/python3.6/concurrent/futures/thread.py", line 56, in run
result = self.fn(*self.args, **self.kwargs)
File "/azure-functions-host/workers/python/3.6/LINUX/X64/azure_functions_worker/dispatcher.py", line 434, in __run_sync_func
return func(**params)
File "/home/site/wwwroot/Assessment/__init__.py", line 21, in main
pdfTable = tabula.read_pdf(blob_to_read,pages='all',multiple_tables=True)
File "/home/site/wwwroot/.python_packages/lib/python3.6/site-packages/tabula/io.py", line 322, in read_pdf
output = _run(java_options, kwargs, path, encoding)
File "/home/site/wwwroot/.python_packages/lib/python3.6/site-packages/tabula/io.py", line 91, in _run
raise JavaNotFoundError(JAVA_NOT_FOUND_ERROR)
Here is my code:
import logging
import azure.functions as func
import io
import re
import os
import tabula
def main(myblob: func.InputStream,blobout: func.Out[str],context: func.Context):
logging.info(f"--- Python blob trigger function processed blob \n"
f"----- Name: {myblob.name}\n"
f"----- Blob Size: {myblob.length} bytes")
inputblob = myblob.read()
blob_to_read = io.BytesIO(inputblob)
pdfTable = tabula.read_pdf(blob_to_read,pages='all',multiple_tables=True)
I have also tried camelot but ran into complications related to ghostscript installation.
I am on a consumption plan. Any help on how to solve this issues would be greatly appreciated.
Thank you.