Textract: failed with exit code 127 // windows 10 // pdftotext

Question

When I'm trying to run my (after deploying with pyinstaller) program for reading and converting a PDF file and entering it into a google sheet. I get the error shown in the image below. However I can not seem to figure out what the problem is:

Exception in Tkinter callback
Traceback (most recent call last):
  File "C:\Users\trpfinance\AppData\Local\Programs\Python\Python38-32\lib\site-packages\textract\parsers\utils.py", line 82, in run
    pipe = subprocess.Popen(
  File "C:\Users\trpfinance\AppData\Local\Programs\Python\Python38-32\lib\subprocess.py", line 854, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "C:\Users\trpfinance\AppData\Local\Programs\Python\Python38-32\lib\subprocess.py", line 1307, in _execute_child
    hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] The system cannot find the file specified

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\trpfinance\AppData\Local\Programs\Python\Python38-32\lib\tkinter\__init__.py", line 1883, in __call__
    return self.func(*args)
  File "EinkaufRGWindows.py", line 40, in InkoopRekeningen
    text = textract.process(str(importfolder) + str(i))
  File "C:\Users\trpfinance\AppData\Local\Programs\Python\Python38-32\lib\site-packages\textract\parsers\__init__.py", line 77, in process
    return parser.process(filename, encoding, **kwargs)
  File "C:\Users\trpfinance\AppData\Local\Programs\Python\Python38-32\lib\site-packages\textract\parsers\utils.py", line 46, in process
    byte_string = self.extract(filename, **kwargs)
  File "C:\Users\trpfinance\AppData\Local\Programs\Python\Python38-32\lib\site-packages\textract\parsers\pdf_parser.py", line 28, in extract
    raise ex
  File "C:\Users\trpfinance\AppData\Local\Programs\Python\Python38-32\lib\site-packages\textract\parsers\pdf_parser.py", line 20, in extract
    return self.extract_pdftotext(filename, **kwargs)
  File "C:\Users\trpfinance\AppData\Local\Programs\Python\Python38-32\lib\site-packages\textract\parsers\pdf_parser.py", line 43, in extract_pdftotext
    stdout, _ = self.run(args)
  File "C:\Users\trpfinance\AppData\Local\Programs\Python\Python38-32\lib\site-packages\textract\parsers\utils.py", line 90, in run
    raise exceptions.ShellError(
textract.exceptions.ShellError: The command `pdftotext //Mac/Home/Desktop/Wickey Einkauf Test/Rekeningen/Lekkerkerker_ - 20803471.pdf -` failed with exit code 127
------------- stdout -------------
------------- stderr -------------

could you copy the error to the question as a snippet? Links to external are generally frowned upon — PirateNinjas, Aug 11 '20 at 11:56
Is it the same error as your other question [textract-failed-with-exit-code-127-pdftotext-on-windows-10](https://stackoverflow.com/questions/63359767/textract-failed-with-exit-code-127-pdftotext-on-windows-10)? You need to install `Poppler` in the machine running your executable. — acw1668, Aug 12 '20 at 09:05

score 2 · Answer 1 · answered Jul 05 '22 at 09:29

2

I had the same issue. It seems to be an OS issue. For me, switching to GIT bash worked. https://github.com/deanmalmgren/textract/issues/229

If you are using Pycharm, change default terminal to bash.

answered Jul 05 '22 at 09:29

Leena

703
1
12
21

This is the only solution for now, should be higher up. The other answers don't mention you NEED a bash shell to properly run textract on certain file types like pdf. I use PyCharm and this worked well for me. Use `C:\Users\\AppData\Local\Programs\Git\bin\bash.exe --login` in PyCharm's terminal shell picker path. – Nicholas Stommel Jul 12 '23 at 14:38

score 0 · Answer 2 · answered Aug 11 '20 at 15:06

You're getting a FileNotFoundError it seems. If you look at the error, the command being run is:

pdftotext //Mac/Home/Desktop/Wickey Einkauf Test/Rekeningen/Lekkerkerker_ - 
 0803471.pdf -

There are a couple of things here I would look at. Firstly, there is an extra slash at the start of your file path, which seems wrong. Secondly, you have spaces in the file path, but there are no quotations enclosing the path. This second part means pdftotext will read this as a few separate command arguments, rather than one. You can fix this by formatting you subprocess call to have the file wrapped in quotation marks, like so:

pdftotext "example file path.pdf" -

The reason for the two slashes is the fact that I am running this on a VM (Parallels Desktop) I don't think I can alter the pdftotext command because it's not a line of code I have written myself. I use textract for my pdf files and somehow it works perfectly fine on mac but has issues on windows. I need to have this running on windows however since this is for a customer. — Thomas Broek, Aug 12 '20 at 07:21

score 0 · Answer 3 · answered Dec 03 '20 at 22:22

0

You need to install pdftotext using pip. To install it you need to have Microsoft Visual C++ 14 or greater.

answered Dec 03 '20 at 22:22

Navid H.Arani

9
1

This doesn't address the problem. See @Leena 's answer below for the relevant GitHub issue with textract on Windows. For now you need to use a bash shell on Windows like Git Bash. You can switch to Git Bash (MinGW64) on Windows by using `C:\Users\\AppData\Local\Programs\Git\bin\bash.exe --login` as your terminal path in PyCharm. – Nicholas Stommel Jul 12 '23 at 14:41

Textract: failed with exit code 127 // windows 10 // pdftotext

3 Answers3

Linked