57

I often want to write commands like this (in zsh, if it's relevant):

find <somebasedirectory> | \
    grep stringinfilenamesIwant | \
    grep -v stringinfilesnamesIdont | \
    xargs dosomecommand

(or more complex combinations of greps)

In recent years find has added the -print0 switch, and xargs has added -0, which allow handling of files with spaces in the name in an elegant way by null-terminating filenames instead, allowing for this:

find <somebasedirectory> -print0 | xargs -0 dosomecommand

However, grep (at least the version I have, GNU grep 2.10 on Ubuntu), doesn't seem to have an equivalent to consume and generate null-terminated lines; it has --null, but that only seems related to using -l to output names when searching in files directly with grep.

Is there an equivalent option or combination of options I can use with grep? Alternatively, is there an easy and elegant way to express my pipe of commands simply using find's -regex, or perhaps Perl?

Andrew Ferrier
  • 16,664
  • 13
  • 47
  • 76
  • 4
    The `-print0` option is normally needed only to handle file names containing newlines not other white spaces because the conventional newline separator (used with `-print`) works fine for them. – pabouk - Ukraine stay strong Dec 04 '13 at 07:53

6 Answers6

60

Use GNU Grep's --null Flag

According to the GNU Grep documentation, you can use Output Line Prefix Control to handle ASCII NUL characters the same way as find and xargs.

-Z
--null
Output a zero byte (the ASCII NUL character) instead of the character that normally follows a file name. For example, ‘grep -lZ’ outputs a zero byte after each file name instead of the usual newline. This option makes the output unambiguous, even in the presence of file names containing unusual characters like newlines. This option can be used with commands like ‘find -print0’, ‘perl -0’, ‘sort -z’, and ‘xargs -0’ to process arbitrary file names, even those that contain newline characters.

Use tr from GNU Coreutils

As the OP correctly points out, this flag is most useful when handling filenames on input or output. In order to actually convert grep output to use NUL characters as line endings, you'd need to use a tool like sed or tr to transform each line of output. For example:

find /etc/passwd -print0 |
    xargs -0 egrep -Z 'root|www' |
    tr "\n" "\0" |
    xargs -0 -n1

This pipeline will use NULs to separate filenames from find, and then convert newlines to NULs in the strings returned by egrep. This will pass NUL-terminated strings to the next command in the pipeline, which in this case is just xargs turning the output back into normal strings, but it could be anything you want.

Todd A. Jacobs
  • 81,402
  • 15
  • 141
  • 199
  • 3
    Hmm, I'm not sure about this. I just wrote that this switch wasn't relevant (as I mentioned in my original question), since the manpage implies to me that it's only relevant when used in combination with switches that generate filenames (e.g. `-l`). However, some rudimentary testing isn't so clear. More investigation needed. Apologies for the premature downvote, which I can't undo. – Andrew Ferrier Apr 13 '13 at 15:49
  • 4
    tr solution is great for all those commands that don't have a print0 like option. – Mordechai Aug 28 '14 at 10:46
  • 5
    The whole point in using `-0` and `-z` switches is that filenames may contain linefeeds in then. Using `tr` amounts to not using the switches at all. – gcscaglia May 21 '16 at 15:30
  • 1
    *technically*, this fails in the case of a filename with a newline in it (which is legal for better or worse). I've never seen this happen, but it's the same reason people yell at you (or me) for parsing `ls` - edge cases. – Wyatt Ward Jul 05 '16 at 00:58
  • @AndrewFerrier I too read it as related to filenames only, i.e. when using grep with -l. Then I found this guy that filed a request to the grep mantainers and they treated this limitation as a bug: http://stackoverflow.com/questions/36066536/how-to-make-grep-separate-output-by-null-characters. So in future versions of grep we should have the null-separated output working even without -l – reallynice Sep 15 '16 at 15:16
  • 5
    I had to use `--null-data` instead of `--null`. I'm not 100% sure why, but it appears from `grep --help` that `--null-data` might change grep's behaviour to use null-termination, whereas `--null` will only _output_ null-termination - not take it into account when processing _input_. – Tim Malone Apr 28 '18 at 07:14
  • 1
    [Relevant post on meta](https://meta.stackoverflow.com/q/405459/8967612). – 41686d6564 stands w. Palestine Feb 23 '21 at 03:13
8

As you are already using GNU find you can use its internal regular expression pattern matching capabilities instead of these grep, eg:

find <somebasedirectory> -regex ".*stringinfilenamesIwant.*" ! -regex ".*stringinfilesnamesIdont.*" -exec dosomecommand {} + 
jlliagre
  • 29,783
  • 6
  • 61
  • 72
4

Use

find <somebasedirectory> -print0 | \
 grep -z stringinfilenamesIwant | \
 grep -zv stringinfilesnamesIdont | \
 xargs -0 dosomecommand

However, the pattern may not contain newline, see bug report.

jarno
  • 787
  • 10
  • 21
3

The newest version of the GNU grep source can now use -z/--null to separate the output by null characters, while it previously only worked in conjunction with -l:

http://git.savannah.gnu.org/cgit/grep.git/commit/?id=cce2fd5520bba35cf9b264de2f1b6131304f19d2

This means that your issue is solved automatically when using the newest version.

chtenb
  • 14,924
  • 14
  • 78
  • 116
2

Instead of using a pipe, you can use find's -exec with the + terminator. To chain multiple commands together, you can spawn a shell in -exec.

find ./ -type f -exec bash -c 'grep "$@" | grep -v something | xargs dosomething' -- {} +
jordanm
  • 33,009
  • 7
  • 61
  • 76
  • Will this spawn a new bash shell for each file found by find? I can't figure that out from the find manpage... – Andrew Ferrier Apr 12 '13 at 16:41
  • @AndrewFerrier - no the `+` terminator causes it to function similar to `xargs`. One shell will be spawned and all files will be passed in. This also works in all POSIX versions of find, unlike `print0`. – jordanm Apr 12 '13 at 16:45
-3
find <somebasedirectory> -print0 | xargs -0 -I % grep something '%'