0

From my Python script I need to call consequently two external binaries in order to process file in two steps:

import os, subprocess
sbp = subprocess.Popen(['program_1', '-i', 'input.file', '-o', 'temp.file'])
sbp = subprocess.Popen(['program_2', '-i', 'temp.file', '-o', 'output.file'])                      
os.remove('temp.file')

However, it would be nice to speed-up the pipe and reduce disk usage by using virtual RAM-based files instead of 'physical' disk-based. I know that I can use StringIO or tempfile.SpooledTemporaryFile() to handle virtual files within Python script, but is there a possibility to pass the link to such file to an external binary?

Roman
  • 2,225
  • 5
  • 26
  • 55

2 Answers2

1

Assuming that you can tell your 2 programs to read and write to/from stdin and stdout, you can just pipe from one subprocess command to the other:

import os, subprocess
sp1 = subprocess.Popen(['program_1', '-i', 'input.file'], stdout=subprocess.PIPE)
sp2 = subprocess.Popen(['program_2', '-o', 'output.file'], stdin=sp1.stdout)
sp1.stdout.close()
sp2.communicate()

See https://docs.python.org/2/library/subprocess.html#replacing-shell-pipeline.

Another option (Unix) is to use a named pipe (created at the OS level, e.g. mkfifo /tmp/mypipe):

import os, subprocess
os.mkfifo('/tmp/mypipe')
sp1 = subprocess.Popen(['program_1', '-i', 'input.file', '-o', '/tmp/mypipe'])
sp2 = subprocess.Popen(['program_2', '-i', '/tmp/mypipe', '-o', 'output.file'])

And it should also be possible to use os.pipe().

mhawke
  • 84,695
  • 9
  • 117
  • 138
  • That's the problem that programs cannot read/write stdin/stdout. Second option seems to use HDD as well, so it just a way to locate the temporary file in another place, not to circumvent its creation. – Roman Oct 29 '14 at 13:35
  • The named pipe/fifo is not really a file in the sense that it will not write user data to the disk. At least in Linux, the kernel will relay the data between reading and writing processes without writing it to the file system. Plus the reading and writing processes will block on IO where appropriate e.g. reader will block if there is nothing to read, unless the reader has opened the pipe in non-blocking mode. Given the inflexibility of the target programs wrt stdin/stdout, a named pipe is probably the best solution. – mhawke Oct 30 '14 at 10:47
  • Is there a possibility to use several pipes simultaneously? My script is multithreaded, so I await a clash between threads with such syntax. – Roman Oct 30 '14 at 12:42
  • Yes. If there is a unique pipe for each reader/writer process (which I suppose means for each thread in your case), there should not be a problem. Just give a unique name for each to `os.mkfifo()`. I don't think that you will be able to share a pipe - writes below the pipe's buffer size should be atomic, however, you'll have no idea which thread will read the written data. – mhawke Oct 30 '14 at 12:54
0
from subprocess import Popen
from tempfile import NamedTemporaryFile

tmp = NamedTemporaryFile('w+')
sbp = Popen(['program_1', '-i', 'input.file', '-o', tmp.name])
sbp = Popen(['program_2', '-i', tmp.name, '-o', 'output.file'])                      
tmp.close()

At the end tmp will be delete.

Mauro Baraldi
  • 6,346
  • 2
  • 32
  • 43
  • It also does not locate temporary file to RAM – Roman Oct 29 '14 at 13:38
  • What you're looking for is the [mmap](https://docs.python.org/2/library/mmap.html) module – Mauro Baraldi Oct 29 '14 at 16:02
  • When you call an external binary, which does not work with stdin/stdout, you have to provide string variable containing path to the file with '-i'/'-o' key. In tempfile module there is special attribute 'name' for this. However, mmap doesn't seem to have an analog. Probably, it's not possible at all. – Roman Oct 29 '14 at 16:28
  • I think, the solution would be like creating a disk in RAM and store temporary files there. It schould definitely increase the performance and save some lifetime of HDD but has a limited safety. Maybe there is something better on OS level (I use Ubuntu 14.04.1 by the way) – Roman Oct 29 '14 at 16:35
  • mmap uses the filesystem, not what you want. A named pipe/fifo is the way to go - it does not store user data on the filesystem - the fifo special file just provides a reference point. See `man fifo` and `man mkfifo`. – mhawke Oct 30 '14 at 10:56