3

I'm trying to grep from a directory and limit the search to the first 100 results. The following code keeps yielding

[..]
grep: writing output: Broken pipe
grep: writing output: Broken pipe
grep: writing output: Broken pipe
grep: writing output: Broken pipe
[..]

The code:

p_grep = Popen(['/bin/bash', '-c', 'grep -F  "asdasdasd" data/*'], stdout = PIPE)
p_head = Popen(['head', '-100'], stdin = p_grep.stdout, stdout = PIPE)
output = p_head.communicate()[0]

How to fix it?

pistacchio
  • 56,889
  • 107
  • 278
  • 420
  • 2
    try this: http://stackoverflow.com/questions/2595602/pythons-popen-cleanup – xkrz Jan 27 '12 at 21:49
  • 1
    @xkrz, isn't the suggested solution there exactly what he's doing? – Rob Wouters Jan 27 '12 at 21:56
  • Do you need to execute grep or is this only an example? Otherwise, grep has a --max-count option you can use instead of piping the whole output through. – GaretJax Jan 27 '12 at 21:58
  • max-count limits the number of rows read in a file, i need to limit the number of searches from all files. – pistacchio Jan 27 '12 at 22:15
  • 1
    @RobWouters, you are right, the example given at the end is exactly what pistcchio doing. My apology for not reading the other post to the end. – xkrz Jan 27 '12 at 22:26

2 Answers2

1

Actually in this case you can do:

output = check_output(['/bin/bash', '-c', 'grep -F  "asdasdasd" data/* | head -100'])
Rob Wouters
  • 15,797
  • 3
  • 42
  • 36
  • the problem with this is that, i don't know why, it executes ALL the grepping and then it pipes throu head, so, while in the shell it only takes some seconds, it takes much longer in python – pistacchio Jan 27 '12 at 22:11
  • @pistacchio, can you try passing `--line-buffered` to `grep` and see if that changes anything? – Rob Wouters Jan 27 '12 at 22:22
  • I can't think of any reasons why there would be a difference. Are you absolutely certain you are comparing the exact same commands? The only other thing is maybe pass `bufsize=1` to check_output. – Rob Wouters Jan 27 '12 at 22:32
0

According to the Popen documentation on writing pipes you should make sure to close the stdout on the piped processes (in this case p_grep) so that they are able to receive a SIGPIPE from the piped-to processes (in this case p_head).

Furthermore, according to this post, it's important to provide a setup function to each subprocess so that Python's handling of SIGPIPE is restored to its default behavior.

So the code becomes:

def preexec_fn():
    import signal
    signal.signal(signal.SIGPIPE, signal.SIG_DFL)

p_grep = Popen(['/bin/bash', '-c', 'grep -F  "asdasdasd" data/*'], stdout=PIPE, preexec_fn=preexec_fn)
p_head = Popen(['head', '-100'], stdin=p_grep.stdout, stdout=PIPE, preexec_fn=preexec_fn)
p_grep.stdout.close()
output = p_head.communicate()[0]

That should cause the grep process to terminate once head completes.

Blanka
  • 7,381
  • 3
  • 23
  • 20