Renaming files based on internal text match - keep all content of file

Question

Still having trouble figuring out how to preserve the contents of a given file using the following code that is attempting to rename the file based on a specific regex match within said file (i.e. within a given file there will always be one SMILE followed by 12 digits, e.g., SMILE000123456789).

for f in FILENAMEX_*; do awk '/SMILE[0-9]/ {OUT=$f ".txt"}; OUT {print >OUT}' ${f%.*}; done

This code is naming the file correctly but is simply printing out everything after the match instead of the entire contents of the file.

The list of files to be processed don't currently have an extension (and they need one for the next step) because I was using csplit to parse out the content from a larger file.

I don't fully understand your question, but at least one of your problems is getting the shell variable into awk: [How do I use shell variables in an awk script?](https://stackoverflow.com/questions/19075671/how-do-i-use-shell-variables-in-an-awk-script) — Benjamin W., Oct 17 '17 at 02:07
Thanks Benjamin, that's fixed at least. Basically, the goal is to rename each file based on a particular text match within the file. It is doing that (i.e. renaming the file) but the output file only has the text **after** the match. I still need to preserve the entire file but only have the name changed. Hope that makes more sense. — jnorth, Oct 17 '17 at 02:24
how about (for one file) something like `if grep -q targetText $file ; then /bin/mv file newFile ; else echo targetText Not found in $file" ; fi` You seem to have the `for` loop figured out. Good luck. — shellter, Oct 17 '17 at 02:49

score 2 · Accepted Answer · answered Oct 17 '17 at 03:01

There are two problems: the first is using a shell variable in your awk program, and the second is the logic of the awk program itself.

To use a shell variable in awk, you can use

awk -v var="$var" '<program>'

and then use just var inside of awk.

For the second problem: if a line doesn't match your pattern and OUT is not set, you don't print the line. After the first line matching the pattern, OUT is set and you print. Since the match might be anywhere in the file, you have to store the lines at least up to the first match.

Here is a version that should work and is pretty close to your approach:

for f in FILENAMEX_*; do
    awk -v f="${f%.*}" '
        /SMILE[0-9]/ {
            out=f".txt"
            for (i=1;i<NR;++i)         # Print file so far
                print lines[i] > out
        }
        out { print > out }            # Match has been seen: print
        ! out { lines[NR] = $0 }       # No match yet: store
    ' "$f"
done

You could do some trickery and work with FILENAME or similar to do everything in a single invocation of awk, but since the main purpose is to find the presence of a pattern in the file, you're much better off using grep -q, which returns an exit status of 0 if the pattern has been found:

for f in FILENAMEX_*; do grep -q 'SMILE[0-9]' "$f" && cp "$f" "${f%.*}".txt; done

Thanks Benjamin, that is some excellent code. I haven't tried the first block yet - a quick try of the 2nd block curiously outputs the match to the terminal window...? — jnorth, Oct 17 '17 at 03:25
After a bit of struggle, finally got it as a variation on @Benjamin W answer. 'for f in FILENAMEX_*; do var=$(grep 'SMILE[0-9]' $f) && cp "$f" "${var%.*}".txt;' — jnorth, Oct 17 '17 at 13:43

score 0 · Answer 2 · answered Oct 17 '17 at 03:58

0

perhaps a different approach and just do each step separately..

ie pseudocode

for all files with some given text
    extract text
    rename file

answered Oct 17 '17 at 03:58

ShoeLace

3,476
2
30
44

Yes, perhaps you are right - maybe back to the perl workhorse. – jnorth Oct 17 '17 at 12:27

Renaming files based on internal text match - keep all content of file

2 Answers2