16

Unix 'file' command has a -0 option to output a null character after a filename. This is supposedly good for using with 'cut'.

From man file:

-0, --print0
         Output a null character ‘\0’ after the end of the filename. Nice
         to cut(1) the output. This does not affect the separator which is
         still printed.

(Note, on my Linux, the '-F' separator is NOT printed - which makes more sense to me.)

How can you use 'cut' to extract a filename from output of 'file'?

This is what I want to do:

find . "*" -type f | file -n0iNf - | cut -d<null> -f1

where <null> is the NUL character.

Well, that is what I am trying to do, what I want to do is get all file names from a directory tree that have a particular MIME type. I use a grep (not shown).

I want to handle all legal file names and not get stuck on file names with colons, for example, in their name. Hence, NUL would be excellent.

I guess non-cut solutions are fine too, but I hate to give up on a simple idea.

M Somerville
  • 4,499
  • 30
  • 38
philcolbourn
  • 4,042
  • 3
  • 28
  • 33

3 Answers3

27

Just specify an empty delimiter:

cut -d '' -f1

Notes:

  • The space between the -d and the '' is important, so that the -d and the empty string get passed as separate arguments; if you write -d'', then that will get passed as just -d, and then cut will think you're trying to use -f1 as the delimiter, which it will complain about, with an error message that "the delimiter must be a single character".
  • Per the comments to this answer, there are some systems where cut doesn't support using the null character as the delimiter. Fortunately, other answers on this page provide solutions that should work on such systems.
ruakh
  • 175,680
  • 26
  • 273
  • 307
  • That is probably to only thing I did't try! Thanks! – philcolbourn Mar 24 '12 at 02:12
  • 3
    @Sukima: It worked for both me and the OP, so obviously it's not correct to say that "This does not work at all." The OP specified that (s)he was using Linux (with possibly Linux-specific background -- the fact that `man file` said (s)he could use `cut` to split on null characters). Since your situation is different, you may need to open a new question. – ruakh Jul 14 '14 at 14:35
  • 1
    `cut: bad delimiter` on OSX. – Daniele Orlando Jan 18 '16 at 22:54
  • 1
    Thanks. Unfortunately this doesn't work for busybox version of cut. – Diego Nov 14 '19 at 10:51
1

This works with gnu awk.

awk 'BEGIN{FS="\x00"}{print$1}'
mklement0
  • 382,024
  • 64
  • 607
  • 775
C. Paul Bond
  • 159
  • 2
  • 3
  • Nice; it's actually a portable approach that should work with all `awk` implementations - I've also tried with the macOS version and with `mawk`. – mklement0 Apr 20 '22 at 21:36
  • In GNU awk and `mawk` you could simplify to: `awk -F '\0' '{ print $1 }'` – mklement0 Apr 20 '22 at 21:48
1
  • ruakh's helpful answer works well on Linux.

  • On macOS, the cut utility doesn't accept '' as a delimiter argument (bad delimiter):

Here is a portable workaround that works on both platforms, via the tr utility; it only makes one assumption:

  • The input mustn't contain \1 control characters (START OF HEADING, U+0001) - which is unlikely in text.

  • You can substitute any character known not to occur in the input for \1; if it's a character that can be represented verbatim in a string, that simplifies the solution because you won't need the aux. command substitution ($(...)) with a printf call for the -d argument.

  • If your shell supports so-called ANSI C-quoted strings - which is true of bash, zsh and ksh - you can replace "$(printf '\1')" with $'\1'

(The following uses a simpler input command to demonstrate the technique).

# In zsh, bash, ksh you can simplify "$(printf '\1')" to $'\1'
$ printf '[first field 1]\0[rest 1]\n[first field 2]\0[rest 2]' |
    tr '\0' '\1' | cut -d "$(printf '\1')" -f 1

[first field 1]
[first field 2]

Alternatives to using cut:

mklement0
  • 382,024
  • 64
  • 607
  • 775
  • +1, but why `"$(printf '\1')"` rather than `$'\1'`? Does the shell that ships with macOS not support `$'...'`? – ruakh Apr 20 '22 at 21:27
  • Thanks, @ruakh - good point. It wouldn't make the solution portable any longer, but it's definitely worth mentioning - please see my update. – mklement0 Apr 20 '22 at 21:34