Say I have the following structure of files and directories:
$ tree
.
├── a
├── b
└── dir
└── c
1 directory, 3 files
That is, two files a
and b
together with a dir dir
, where another file c
stands.
I want to process all the files with awk
(GNU Awk 4.1.1
, exactly), so I do something like this:
$ gawk '{print FILENAME; nextfile}' * */*
a
b
awk: cmd. line:1: warning: command line argument `dir' is a directory: skipped
dir/c
All is fine but the *
also expands to the directory dir
and awk
tries to process it.
So I wonder: is there any native way awk
can check if the given element is a file or not and, if so, skip it? That is, without using system()
for it.
I made it work by calling the external system
in BEGINFILE:
$ gawk 'BEGINFILE{print FILENAME; if (system(" [ ! -d " FILENAME " ]")) {print FILENAME, "is a dir, skipping"; nextfile}} ENDFILE{print FILENAME, FNR}' * */*
a
a 10
a.wk
a.wk 3
b
b 10
dir
dir is a dir, skipping
dir/c
dir/c 10
Note also the fact that if (system(" [ ! -d " FILENAME " ]")) {print FILENAME, "is a dir, skipping"; nextfile}
works counter intuitively: it should return 1 when true, but it returns the exit code.
I read in A.5 Extensions in gawk Not in POSIX awk:
- Directories on the command line produce a warning and are skipped (see Command-line directories)
And then the linked page says:
4.11 Directories on the Command Line
According to the POSIX standard, files named on the awk command line must be text files; it is a fatal error if they are not. Most versions of awk treat a directory on the command line as a fatal error.
By default, gawk produces a warning for a directory on the command line, but otherwise ignores it. This makes it easier to use shell wildcards with your awk program:
$ gawk -f whizprog.awk * Directories could kill this program
If either of the --posix or --traditional options is given, then gawk reverts to treating a directory on the command line as a fatal error.
See Extension Sample Readdir, for a way to treat directories as usable data from an awk program.
And in fact it is the case: the same command as before with --posix
fails:
$ gawk --posix 'BEGINFILE{print FILENAME; if (system(" [ ! -d " FILENAME " ]")) {print FILENAME, "is a dir, skipping"; nextfile}} ENDFILE{print FILENAME, NR}' * */*
gawk: cmd. line:1: fatal: cannot open file `dir' for reading (Is a directory)
I checked the 16.7.6 Reading Directories
section that is linked above and they talk about readdir
:
The readdir extension adds an input parser for directories. The usage is as follows:
@load "readdir"
But I am not sure neither how to call it nor how to use it from the command line.