2

The exact same command:

man curl | grep -Pzo 'EXIT CODES(.|\n)*AUTHORS' | grep '  6  '

sometimes gives the expected output:

       6      Couldn't resolve host. The given remote host was not resolved.

and sometimes gives the error:

Binary file (standard input) matches

eg:

$ man curl | grep -Pzo 'EXIT CODES(.|\n)*AUTHORS' | grep '  6  '
       6      Couldn't resolve host. The given remote host was not resolved.

$ man curl | grep -Pzo 'EXIT CODES(.|\n)*AUTHORS' | grep '  6  '
Binary file (standard input) matches

$ man curl | grep -Pzo 'EXIT CODES(.|\n)*AUTHORS' | grep '  6  '
Binary file (standard input) matches

$ man curl | grep -Pzo 'EXIT CODES(.|\n)*AUTHORS' | grep '  6  '
       6      Couldn't resolve host. The given remote host was not resolved.

Versions of relevant packages:

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 16.04.1 LTS
Release:        16.04
Codename:       xenial

$ grep --version
grep (GNU grep) 2.25
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Mike Haertel and others, see <http://git.sv.gnu.org/cgit/grep.git/tree/AUTHORS>.

$ man --version
man 2.7.5

$ curl --version
curl 7.47.0 (x86_64-pc-linux-gnu) libcurl/7.47.0 GnuTLS/3.4.10 zlib/1.2.8 libidn/1.32 librtmp/2.3
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtmp rtsp smb smbs smtp smtps telnet tftp
Features: AsynchDNS IDN IPv6 Largefile GSS-API Kerberos SPNEGO NTLM NTLM_WB SSL libz TLS-SRP UnixSockets

I'm really scratching my head over this one.

I've solved my issue by putting -a flags into my greps as such: man curl | grep -Pzoa 'EXIT CODES(.|\n)*AUTHORS' | grep -a ' 6 '

But I'm truly stumped over why it only errors sometimes?...

Dean Rather
  • 31,756
  • 15
  • 66
  • 72

1 Answers1

7

Because the -z option is used, the first grep appends a NUL character to the end of the output. What happens next depends on the vagaries of buffering. If the second grep sees that NUL before analyzing the file, it decides that the file is binary. If it doesn't, it finds the match that you want.

So, this happened to work for me:

$ man curl | grep -Pzo 'EXIT CODES(.|\n)*AUTHORS' | grep '  6  '
       6      Couldn't resolve host. The given remote host was not resolved.

However, if I put the output of the first grep in a temporary file and asking the second grep to read that, then the second grep would always complain about the input being binary:

$ man curl | grep -Pzo 'EXIT CODES(.|\n)*AUTHORS' >tmpfile;  grep '  6  ' tmpfile
Binary file tmpfile matches

Alternative: use awk

One way of avoiding the NUL character issues, as well as reducing the number of processes required, is to use awk:

$ man curl | awk '/EXIT CODES/,/AUTHORS/{if (/   6   /) print}'
       6      Couldn't resolve host. The given remote host was not resolved.

Alternative: use sed

$ man curl | sed -n '/EXIT CODES/,/AUTHORS/{/   6   /p}'
       6      Couldn't resolve host. The given remote host was not resolved.

Alternative: use greps and tr

As tripleee suggests, another option is to use tr to replace the NUL with a newline:

$ man curl | grep -Pzo 'EXIT CODES(.|\n)*AUTHORS' | tr '\000' '\n' | grep '  6  '
       6      Couldn't resolve host. The given remote host was not resolved.
John1024
  • 109,961
  • 14
  • 137
  • 171