0

I am trying to cat the contents of a file and pipe it into the stdin of a second python script, then put the stdout of that into another file.

On the command line it looks something like this:

cat input_file | python3 ~/Desktop/python_script.py > output_file

I have tried to do it like this after reading a number of posts

file_input = subprocess.Popen(('cat', input_file), stdout=subprocess.PIPE)
file_output = subprocess.check_output(('python3', '~/Desktop/mdparser.py'), stdin=file_input.stdout, stdout=subprocess.PIPE)
subprocess.check_output('>','output_file',stdin = file_output.stdout)

However I get the following error for the second line:

File "/usr/local/Cellar/python3/3.4.1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/subprocess.py", line 598, in check_output
    raise ValueError('stdout argument not allowed, it will be overridden.')
ValueError: stdout argument not allowed, it will be overridden.
holmeswatson
  • 969
  • 3
  • 14
  • 39
  • 1
    You don't need, and shouldn't use, `cat` for this command. `python3 python_script.py output_file` is considerably more efficient. – Charles Duffy Jan 20 '15 at 20:31
  • 1
    ...thus, `stdin` on the `subprocess` call should point directly to a handle on the input file. – Charles Duffy Jan 20 '15 at 20:32
  • 1
    ...and, well, `>` isn't a process at all, so don't know what you're trying to do there; it shouldn't have its own `subprocess.Popen` object; instead, `stdout` on the second process should point straight to a file handle, not to a PIPE, if you want it to go to file. – Charles Duffy Jan 20 '15 at 20:33
  • you should put the necessary functionality into a function, class and import the module instead of running it as a subprocess. If you don't want to change the script; you could set `sys.stdin`, `sys.stdout` to file objects and call [`runpy.run_path()`](https://docs.python.org/3/library/runpy.html#runpy.run_path). – jfs Jan 22 '15 at 04:31

1 Answers1

1

This should be only one call, not three.

exit_status = subprocess.call(
  ['python3', os.path.expanduser('~/Desktop/mdparser.py')],
  stdin=open('input_file', 'r'), stdout=open('output_file', 'w'))

Tilde expansion (~/foo) is processed by the shell; when you don't have a shell, as here, you need to explicitly do it yourself -- that's what os.path.expanduser does.

You can't use check_output() when stdout is redirected, whether to a different process or a file -- this is why the exception is thrown, as the Python interpreter can't both read the content into a variable and connect it directly into a pipeline to a different process. That's what the message means about "will be overridden" -- when you use check_output(), you're telling the Python interpreter to read output from a pipeline itself, but it can't do that when you configure that output to go to a different process or a file.

Instead, direct the output straight to the file, and open the file and read it when done.


The other reason not to use cat is that all it does is add inefficiency and restrict operation. When you run:

foo <input.txt >output.txt

...or, if you prefer the form...

<input.txt foo >output.txt

...the foo program gets a file handle directly on input.txt, and another directly on output.txt. When you don't use cat, those file handles are the real deal -- it's possible to seek around in the files, meaning that if your program would have to go back and review prior content, it can just tell the file handle to go back and seek to a different part. By contrast, if you ran cat input.txt | foo, then foo would have to store everything it read in memory if the operation it's performing requires more than one pass.

Using cat is just overhead here -- it's an extra program that reads from the input file and writes to its half of the pipeline, after all, meaning that it's doing extra IO to and from the pipe and context switches to and from the kernel. Don't use it unless you need to -- such as if you're concatenating multiple files into a single stream (which is cat's purpose, hence its name).

Charles Duffy
  • 280,126
  • 43
  • 390
  • 441