tabula-py can't read file when the python script called by java

Question

I'm working on a project base on java. And the java program will run command to call a python script.

The python script is used tabula-py to read a pdf file and return the data.

I tried the python script was work when I direct call it in terminal (pytho3 xxx.py)

However, when I tried to call the python script from java, it will throw error:

Error from tabula-java:Error: File does not exist
Command '['java', '-Dfile.encoding=UTF8', '-jar', '/home/ubuntu/.local/lib/python3.8/site-packages/tabula/tabula-1.0.5-jar-with-dependencies.jar', '--pages', 'all', '--lattice', '--guess', '--format', 'JSON', '/home/ubuntu/Documents/xxxx.pdf']' returned non-zero exit status 1.

I tried to call the script in full path, provide the pdf file in full path, tried sys.append(python script path) and both of them are not worked.

I've tried to call the tabula in java command, i.e. java -Dfile.encoding=UTF8 -jar /home/ubuntu/.local/lib/python3.8/site-packages/tabula/tabula-1.0.5-jar-with-dependencies.jar "file_path"

And it's work and can read the file. However back to java to call the python script is not work

Is there any method to solve this? Use the tabula in java program is not an option for my case

score 0 · Answer 1 · 2021-11-29T05:32:58.540

Now that you mention that you mention you use java for base code and python for reading PDF, It's better of using java entirely for more efficient code. Why? Because there are tools already ready for you. There is absolutely no need for struggling to link one language to another.

code:


import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.parser.PdfTextExtractor;

/**
 * This class is used to read an existing
 *  pdf file using iText jar.
 */
public class PDFReadExample {
    public static void main(String args[]){
        try {
            //Create PdfReader instance.
            PdfReader pdfReader = new PdfReader("D:\\testFile.pdf");    
            
            //Get the number of pages in pdf.
            int pages = pdfReader.getNumberOfPages(); 
            
            //Iterate the pdf through pages.
            for(int i=1; i<=pages; i++) { 
                //Extract the page content using PdfTextExtractor.
                String pageContent = 
                    PdfTextExtractor.getTextFromPage(pdfReader, i);
                
                //Print the page content on console.
                System.out.println("Content on Page "
                              + i + ": " + pageContent);
            }
            
            //Close the PdfReader.
            pdfReader.close();
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). — Community, Nov 29 '21 at 04:45

tabula-py can't read file when the python script called by java

1 Answers1