1

I want awk to process all files in unzipped form from all sub-directories.

This is what I've tried: It works only for top level directory files.

for file in *
do
    awk -v f="$file" 'NF > 0 {print f; nextfile}' <(gunzip -cf $(find . -type f) "$file")
done

(My goal is not to just print non empty file names using awk, I just gave this as an example)

manish ma
  • 1,706
  • 1
  • 14
  • 19
  • do the zipped files unzip into single files, or a collection of files? if the latter, do you want `awk` to process the collection as a single file or as the individual files? – markp-fuso Aug 13 '23 at 12:42
  • are you looking to print the name of the zipped or unzipped files? – markp-fuso Aug 13 '23 at 12:44
  • Are we to assume all files are zipped? Or do your zipped files have a characteristic extension such as `.gz`? – Mark Setchell Aug 13 '23 at 13:45
  • @markp-fuso I want to print the file names(both zipped and unzipped), and later I want to cat and sort all information from all files - output to a single big file – manish ma Aug 13 '23 at 13:48
  • @MarkSetchell .gz and some files are not zipped – manish ma Aug 13 '23 at 13:50

3 Answers3

3

Assumptions:

  • we want awk to print the name of the zipped file
  • in the case of a zipped file that consists of a group of files we (still) only want to print the name of the zipped file, otherwise OP will need to provide more details on how to process the zipped file

One idea:

while read -r file
do
    awk -v f="$file" 'NF>0 {print f; nextfile}' <(gunzip -cf "$file")
done < <(find . -type f)
markp-fuso
  • 28,790
  • 4
  • 16
  • 36
  • Thanks it works! can you please explain the < < operator? – manish ma Aug 13 '23 at 14:54
  • 1
    it's a combination of process substitution and redirection; see [this for a brief explanation](https://stackoverflow.com/a/28927847) or [this - specifically example 23-2](https://tldp.org/LDP/abs/html/process-sub.html) – markp-fuso Aug 13 '23 at 15:23
  • @markp-fuso : doesn't `find . -type f` also include all the `.something` "hidden" files, e.g. `.zshenv` `.bash_profile` etc ? – RARE Kpop Manifesto Aug 14 '23 at 12:58
  • @RAREKpopManifesto sure, if OP happens to be sitting in the home directory; I'm merely re-using what OP has provided (namely, `find . -type f`); I'm guessing if OP runs into problems with too-many, or too-few, or 'wrong' files being scooped up by this process then they'll modify the `find` accordingly (or add some conditions to filter out unwanted files) – markp-fuso Aug 14 '23 at 13:33
  • 1
    @markp-fuso : cool just checking to be sure ( i was getting all those lovely macos `.DS_Store` when i tried the command and thought I did it all wrong ) – RARE Kpop Manifesto Aug 15 '23 at 09:28
1
find . -type f -exec awk 'NF > 0 {print FILENAME}' {} \;

This might work For your second question:

find . -type f -name "*.gz" -exec sh -c 'gunzip -c "$1" | awk "NF > 0 {print \"$1\"}"' _ {} \;
dodrg
  • 1,142
  • 2
  • 18
0

TXR Lisp:

(ftw "."                                          ;; POSIX nftw function
     (lambda (path type . rest)
       (if (eql type ftw-f)                       ;; if regular file
         (with-stream (s (open-file path "z"))    ;; open with "z" option
           (typecase s                            ;; switch on type of stream
             (gzip-stream                         ;; gzip stream: it's a zipped file 
               (put-line `gzipped file @path`))
             (t                                   ;; any other type: it isn't.
               (put-line `not gzipped file @path`)))))))

Kaz
  • 55,781
  • 9
  • 100
  • 149
  • just curious what does `TXR Lisp` return for something like `.tgz` or `.tar.gz` files ? – RARE Kpop Manifesto Aug 14 '23 at 13:03
  • ps : i couldn't connect to the TXR Lisp link at all - cannot resolve domain, not even Root Server A – RARE Kpop Manifesto Aug 14 '23 at 13:04
  • 1
    @RAREKpopManifesto I fixed the URL. Of course there isn't a top-level domain .txr. :) A .tgz or .tar.gz file is just another gzip file, so the `"z"` mode of `open-file` will recognize it and produce a `gzip-stream` rather than `file-stream`. – Kaz Aug 14 '23 at 16:11
  • i like your macro `https://www.nongnu.org/txr/txr-manpage.html#N-000264BC`, but can you perhaps adjust the formatting a bit, cuz I had to do a lot of horizontal scrolling for some reason – RARE Kpop Manifesto Aug 14 '23 at 23:54
  • @RAREKpopManifesto Are you talking about the TXR manual? Yes, it basically resizes to fit the width of your browser window and has rather small margins. You should never see horizontal scrolling unless you make the window ridiculously narrow. And then, the regular paragraph text will still wrap: only the verbatim code examples might require scrolling. All the code examples stick to 80 columns, so that they look fine when the man page is viewed as a man page in an 80 column terminal. – Kaz Aug 15 '23 at 00:34
  • yeah now it's gone ( and i tested with safari chrome and firefox) – RARE Kpop Manifesto Aug 15 '23 at 02:07