2

I have a directory of files with filenames of the form file000.txt to filennn.txt. I would like to be able to specify a range of file names and print the content of those files based on a match. I have achieved it with a single file pattern:

$ gawk 'FILENAME ~/file038.txt/ {print FILENAME, $0}' file*.txt
file038.txt Some 038 text here

But I cannot get a pattern that would allow me to specify a range of file names, for instance

gawk 'FILENAME ~/file[038-040].txt/ {print FILENAME, $0}' file*.txt

I'm sure I'm missing something simple here, I'm an AWK newbie. Any suggestions?

WYSIWYG
  • 494
  • 6
  • 23
pelorus32
  • 23
  • 1
  • 4

4 Answers4

2

you can do some substitution on the filename, for example:

awk '{x=FILENAME;gsub(/[^0-9]/,"",x);x+=0}x>10&&x<50{your logic}' file*.txt

in this way, file file011.txt ~ file049.txt would be handled with "your logic"

You can adjust the part: x>10&&x<50 for example, handle only file with the number in the name as odd/even/.... just write boolean expressions there.

Kent
  • 189,393
  • 32
  • 233
  • 301
  • This works perfectly, including tolerating missing files. Problem is...I don't understand quite why :-)It's the gsub bit I'll have to work to understand. – pelorus32 Oct 14 '14 at 09:54
  • @pelorus32 add `print x;` after each statement in the first `{...}` to see how `x` was modified. – Kent Oct 14 '14 at 09:59
  • As the Q mentions gawk, and I assume gawk > 4, this would put the test once per file in stead of once per line: `gawk 'BEGINFILE{x=FILENAME;gsub(/[^0-9]/,"",x);if(x<11||x>49){nextfile}};{your logic here}' *` – joepd Oct 14 '14 at 18:37
0

Solution using gawk and a recent version of bash

There is a bash primitive to handle file[038-040].txt. It makes the code quite simple:

gawk 'FNR==1 {print FILENAME, $0} {quit}' file{038..040}.txt

Key points:

  • FNR==1 {print FILENAME, $0}

    This prints the filename and the first line of each file

  • {quit}

    This saves time by skipping directly to the next file.

  • file{038..040}.txt

    The construct {038..040} is a bash feature called brace expansion. bash will replace this with the file names that you want. If you want to test out brace expansion to see how it works, try it on the command line with this simple statement:

    echo file{038..040}.txt
    

UPDATE 1: Mac OSX currently uses bash v3.2 which does not support leading zeros in brace expansion.

UPDATE 2: If there are missing files and you have a modern gawk (v4.0 or better), use this instead:

gawk 'BEGINFILE{ if (ERRNO) nextfile} FNR==1 {print FILENAME, $0} {quit}' file{038..040}.txt

Solution using gawk with a plain POSIX shell

gawk '{n=0+substr(FILENAME,5,3)} FNR==1 && n>=38 && n<=40 {print FILENAME, $0} {quit}' file*.txt

Explanation:

  • n=0+substr(FILENAME,5,3)

    Extract the number from the filename. 0+ is a trick to force awk to treat n as numeric.

  • n>=38 && n<=40 {print FILENAME, $0}

    This selects the file based on its number and prints the filename and first line.

  • {quit}

    As before, this saves time by stopping awk from reading the rest of each file.

  • file*.txt

    This can be expanded by any POSIX shell to the list of file names.

John1024
  • 109,961
  • 14
  • 137
  • 171
  • Nearly there: I tried this code and here was the result: `gawk 'FNR==1 {print FILENAME, $0} {quit}' file{038..040}.txt gawk: cmd. line:1: fatal: cannot open file `file38.txt' for reading (No such file or directory)` it appears to drop the leading zero, but I like the approach. – pelorus32 Oct 14 '14 at 08:01
  • @pelorus32 Very strange. It does not do that for me. It will keep as many zeros as I choose to add. Run `bash --version` and tell me what it says? I am on v4.2.37 – John1024 Oct 14 '14 at 08:04
  • @pelorus32 I added a solution that doesn't need bash's brace expansion. – John1024 Oct 14 '14 at 08:16
  • That might be the issue - thank Apple `GNU bash, version 3.2.51(1)-release (x86_64-apple-darwin13) Copyright (C) 2007 Free Software Foundation, Inc.` maybe I should use the Linux machine – pelorus32 Oct 14 '14 at 08:16
  • The result of the echo command shows the issue: `$ echo file{038..040}.txt file38.txt file39.txt file40.txt` – pelorus32 Oct 14 '14 at 08:18
  • @pelorus32 OK. v3.2 is old. See my second solution which does not use the brace feature. – John1024 Oct 14 '14 at 08:32
  • No need for `FNR==1` if you are quitting on the first line anyway –  Oct 14 '14 at 08:37
  • The original solution works on Ubuntu 14.04 however it doesn't cope with missing files in the sequence...should have mentioned that in the original question :-) It dies with a fatal error and doesn't parse the files after the missing one. – pelorus32 Oct 14 '14 at 08:48
  • @Jidder Did you try it? My gawk (GNU Awk 4.0.1) did not actually quit after the first line unless `FNR==1` was specified. – John1024 Oct 14 '14 at 08:54
  • @pelorus32 Good point about missing files: I added code to handle that. It requires `gawk` 4.0 or better. For older `gawk`, then, the POSIX-shell solution is better. – John1024 Oct 14 '14 at 09:10
  • @John1024 I wrongly assumed `quit` did something and i just hadn't seen it before. It actually does nothing, which is why you need `FNR==1` as the whole file is read. –  Oct 14 '14 at 09:14
  • The alternative for any POSIX shell works on both Mac and Ubuntu. Thanks – pelorus32 Oct 14 '14 at 10:05
0

Odd way but something on these lines:

awk '{ if (match(FILENAME,/file0[3-4][0-8].txt/)) { print FILENAME, $0}}' file*.txt
SMA
  • 36,381
  • 8
  • 49
  • 73
  • why not `awk 'FILENAME~/file0[3-4][0-8].txt/{ print FILENAME, $0}'` –  Oct 14 '14 at 08:39
  • so it would pick up 030/031 even which is not needed hence i said something on these lines :) – SMA Oct 14 '14 at 08:41
0

Should work

awk '(x=FILENAME)~/(3[8-9]|40).txt$/{print x,$0;quit}' file*.txt

As quit doesn't work(atleast with my version of awk) here is another way

awk 'FNR==((x=FILENAME)~/(3[8-9]|40).txt$/){print x,$0}' file*.txt