0

I'm trying to do the following, but I'm getting errors on my ExecuteStreamCommand:

Cannot run program "C:\Python36\pythonscript.py" error=193 not a valid Win32 application"

This is being run on my home Windows work station.

  1. GetFile (Get my PDF)
  2. ExecuteStreamCommand (Call Python script to parse PDF with Tika, and create JSON file)
  3. PutFile (Output file contains JSON that I will use later)

Does NiFi have a built in PDF parser? Is there something more NiFi compatible that Tika?

If not, how do I call one from ExecuteStreamCommand?

Regards and thanks in advance!

Vishal Upadhyay
  • 781
  • 1
  • 5
  • 19

2 Answers2

0

Cannot run program "C:\Python36\pythonscript.py" error=193 not a valid Win32 application"

You need to add a reference to your Python executable to the command to run with ExecuteStreamCommand as you cannot run Python scripts on Windows with the shebang (#!/usr/bin/python for example on Linux).

Mike Thomsen
  • 36,828
  • 10
  • 60
  • 83
0

Python script with tika module triggered from NiFi is a good solution to parse a pdf since there is no in-built option available in NiFi as of now.

you can also try other modules in python like PyPDF2 or pdfminer.

Later, the script can be configured in the ExecuteSteamCommand processor with the Properties tab as follows.

Command Path: path/to/python

Command Arguments: /path/to/pdf-parser.py

Ignore STDIN: false
Ashok Thakur
  • 116
  • 5