1

I would like to do a find in some dir, and do a awk on the files in this direcory, and then replace the original files by each result.

find dir | xargs cat | awk ... | mv ... > filename

So I need the filename (of each of the files found by find) in the last command. How can I do that?

ericj
  • 2,138
  • 27
  • 44
  • What problem you are got in this ? –  May 25 '16 at 12:24
  • 1
    write a script and pass the filename `find dir -exec myscript {} \;` where myscript starts with like `name="$1"`and the does the cat, the awk etc. – Stefan Hegny May 25 '16 at 12:29
  • `xargs cat` here is pointless (and won't work because it'll combine files). `xargs awk` would be better (and would let `awk` handle each file independently, including running `mv` if you really want). – Etan Reisner May 25 '16 at 13:32
  • Where are you expecting to write the content from `awk` with this sort of pipeline? You need a file for `mv` to operate on (and it doesn't read from standard input. You *could* have `awk` (with `xargs awk`) write to a temporary file and then have it print the new and old names to standard output and then use `xargs mv` (with it limited to two entries per-comand) but that's overly complicated for this. – Etan Reisner May 25 '16 at 13:34

2 Answers2

0

I would use a loop, like:

for filename in `find . -name "*test_file*" -print0 | xargs -0`
do
    # some processing, then
    echo "what you like" > "$filename"
done

EDIT: as noted in the comments, the benefits of -print0 | xargs -0 are lost because of the for loop. And filenames containing a white space are still not handled correctly.

The following while loop would not handle unusual filenames neither (good to know it, though it was not in the question), but filenames with a standard white space at least, so it works better, indeed:

find . -name "*test*file*" -print > files_list

while IFS= read -r filename
do
    # some process
    echo "what you like" > "$filename"
done < files_list
zezollo
  • 4,606
  • 5
  • 28
  • 59
  • I know, but I was wondering if you can do this by starrting with `find dir`, not with `for f in $(find dir)` – ericj May 25 '16 at 12:34
  • Maybe you could add this precision in your question? Without storing the filenames somewhere, I don't know how you can get them back after several commands. – zezollo May 25 '16 at 12:45
  • `find dir | while read filename` will behave very similar and *does* start with find dir - but zezollos answer covers cases e.g. with newlines in filenames that would not be handled by this - but nobody really understands why you are keen to "start with `find dir`" – Stefan Hegny May 25 '16 at 12:45
  • 1
    This answer does *not* handle filenames with newlines/spaces, etc. in the names. Any safety on that front that you gain from `-print0` and `-0` you lose immediately (and worse) by using `for filename in \`...\``. Because you shouldn't [read lines with `for`](http://mywiki.wooledge.org/DontReadLinesWithFor). – Etan Reisner May 25 '16 at 13:30
  • @EtanReisner ...and by redirecting to the unquoted `$filename`. – Benjamin W. May 25 '16 at 13:41
  • @BenjaminW. yes, you're right but quoting `$filename` is not enough – zezollo May 25 '16 at 13:43
  • @zezollo No, I just added to the list. – Benjamin W. May 25 '16 at 13:44
0

You could do something like this (but I wouldn't recommend it at all).

find dir -print0 |
    xargs -0 -n 2 awk -v OFS='\0' '<process the input and write to temporary file>
        END {print "temporaryfile", FILENAME}' |
    xargs -0 -n 2 mv

This passes the files to awk directly two at a time (which avoids the problem with your original where cat will get hundreds (perhaps more) files as arguments all at once and spit all their content at awk via standard input at once and thus lose their individual contents and filenames entirely).

It then has awk write the processed output to a temporary file and then outputs the temporary filename and the original filename where xargs picks them up (again two at a time) and runs mv on the pairs of temporary file/original file names.

As I said at the beginning however this is a terrible way to do this.

If you have a new enough version of GNU awk (version 4.1.0 or newer) then you could just use the -i (in-place) argument to awk and use (I believe):

find dir | xargs awk -i '......'

Without that I would use a while loop of the form in Bash FAQ 001 to read the find output line-by-line and operate on it in the loop.

Etan Reisner
  • 77,877
  • 8
  • 106
  • 148