0

I keep getting this error while using Tabula on python.

I've gone through EVERY stackoverflow question related to this and blogs as well.

My JDK JRE is up to date.

java version "1.8.0_161" Java(TM) SE Runtime Environment (build 1.8.0_161-b12) Java HotSpot(TM) 64-Bit Server VM (build 25.161-b12, mixed mode)

My path is correctly defined in Environment variables.

Python version running on Anaconda.

Python 3.6.5 |Anaconda, Inc

df = tabula.read_pdf("C:\XXXXX\PDFExtractor\Test.pdf")

I've tried with encoding as well.

Tabula CalledProcessError:  Command '['java', '-jar', 'C:\\Users\\xxxxx\\AppData\\Local\\Continuum\\anaconda3\\lib\\site-packages\\tabula\\tabula-1.0.1-jar-with-dependencies.jar', '--pages', '1', '--guess', 'C:\\Users\\xxxxxx\\PDFExtractor\\Test.pdf']' returned non-zero exit status 2.

Appreciate the help.

Jmb
  • 18,893
  • 2
  • 28
  • 55
Pai
  • 1
  • 5
  • Status 2 *usually* means no such file or directory (`ENOENT`). Double check all your filenames. – cdarke Oct 04 '18 at 06:02
  • I recheked my path.for the dependencies aswell as the file to be converted, they are correct. – Pai Oct 04 '18 at 06:15
  • What about `java`, is that in your `PATH`? – cdarke Oct 04 '18 at 06:59
  • Yes. it is. I have literally gone through 20+ stackoverflow questions on this . Done everything needed. :( – Pai Oct 04 '18 at 07:10
  • Its unfortunate that you don't have anything like `strace` in your environment, since you have elected to use Windows. You have to track down which file it is having an issue with, that's not something anyone else can do without your exact environment and setup. – cdarke Oct 04 '18 at 07:14
  • I don't know what strace does but we have a traceback on anaconda. – Pai Oct 04 '18 at 07:29
  • 1st part of the traceback Error: Traceback (most recent call last): File "", line 1, in df = tabula.read_pdf("C:\\xxxxx\\PDFExtractor\\Test.pdf",pages=2) File "C:\xxxxx\AppData\Local\Continuum\anaconda3\lib\site-packages\tabula\wrapper.py", line 85, in read_pdf output = subprocess.check_output(args) – Pai Oct 04 '18 at 07:31
  • 2nd part of the traceback "C:\xxxxxx\AppData\Local\Continuum\anaconda3\lib\subprocess.py", line 336, in check_output **kwargs).stdout File "C:\xxxxx\AppData\Local\Continuum\anaconda3\lib\subprocess.py", line 418, in run output=stdout, stderr=stderr) – Pai Oct 04 '18 at 07:32
  • I've found the error. I basically ran java -jar 'C:\Users\xxxxx\AppData\Local\Continuum\anaconda3\lib\site-packages\tabula\tabula-1.0.1-jar-with-dependencies.jar' 'C:\\Users\\xxxxxx\\PDFExtractor\\Test.pdf' on the command line. it throws and error. – Pai Oct 04 '18 at 07:55
  • But if I replace the ' with the " then it give me the output of the parsed pdf on the command line. java -jar "C:\Users\xxxxx\AppData\Local\Continuum\anaconda3\lib\site-packages\tabula\tabula-1.0.1-jar-with-dependencies.jar" C:\Users\xxxxxx\PDFExtractor\Test.pdf' Now How do i get python to pass the first part in double quotes? – Pai Oct 04 '18 at 07:57
  • `cmd.exe` in Windows only accepts double quotes, not single. You currently have your filename in python inside double-quotes, but you could use single quotes and embed the double within the path, e.g.: `'"C:\XXXXX\PDFExtractor\Test.pdf"'` (python doesn't care if you use double or single quotes, provided they match). – cdarke Oct 04 '18 at 08:29
  • The issue is not with the double quotes with the file '"C:\XXXXX\PDFExtractor\Test.pdf"' The issue is with the Jar file its executing. its taking it in single quotes and not recognizing it. – Pai Oct 04 '18 at 08:57
  • That was only an example, you asked *How do i get python to pass the first part in double quotes?* – cdarke Oct 04 '18 at 08:59
  • aha. I see. Well. Thanks. – Pai Oct 04 '18 at 09:04
  • is there anything I can do to make cmd to recognize single quotes? – Pai Oct 04 '18 at 09:06
  • And is there another way to extract tables from pdf in python. Appreciate the help! – Pai Oct 04 '18 at 09:07

2 Answers2

0

You need to escape backslashes or use a raw string:

df = tabula.read_pdf("C:\\XXXXX\\PDFExtractor\\Test.pdf")

or

df = tabula.read_pdf(r"C:\XXXXX\PDFExtractor\Test.pdf")

otherwise your file is seen as C:XXXXXPDFExtractorTest.pdf

Jmb
  • 18,893
  • 2
  • 28
  • 55
0

I've found the error. I basically ran java -jar 'C:\Users\xxxxx\AppData\Local\Continuum\anaconda3\lib\site-packages\tabula\tabula-1.0.1-jar-with-dependencies.jar' 'C:\Users\xxxxxx\PDFExtractor\Test.pdf' on the command line. it throws and error

But if I replace the ' with the " then it give me the output of the parsed pdf on the command line.

java -jar "C:\Users\xxxxx\AppData\Local\Continuum\anaconda3\lib\site-packages\tabula\tabula-1.0.1-jar-with-dependencies.jar" 'C:\Users\xxxxxx\PDFExtractor\Test.pdf'

Now How do i get python to pass the first part in double quotes?

Pai
  • 1
  • 5