-1

Running this command fails:

$(printf "awk '{%sprint}'" $(tail -n +2 file.txt | cut -f2 | sort | uniq | awk 'BEGIN{a=1}{printf "gsub(\"%s\",%i);", $1,a++}')) file.txt

It gives the following error:

awk: '
awk: ^ invalid char ''' in expression

However, if I run the substituted command, I get this:

awk '{gsub("ACB",1);gsub("ASW",2);gsub("BEB",3);gsub("CDX",4);gsub("CEU",5);gsub("CHB",6);gsub("CHS",7);gsub("CLM",8);gsub("ESN",9);gsub("FIN",10);gsub("GBR",11);gsub("GIH",12);gsub("GWD",13);gsub("IBS",14);gsub("ITU",15);gsub("JPT",16);gsub("KHV",17);gsub("LWK",18);gsub("MSL",19);gsub("MXL",20);gsub("PEL",21);gsub("PJL",22);gsub("PUR",23);gsub("STU",24);gsub("TSI",25);gsub("YRI",26);print}'

which I can run like so:

awk '{gsub("ACB",1);gsub("ASW",2);gsub("BEB",3);gsub("CDX",4);gsub("CEU",5);gsub("CHB",6);gsub("CHS",7);gsub("CLM",8);gsub("ESN",9);gsub("FIN",10);gsub("GBR",11);gsub("GIH",12);gsub("GWD",13);gsub("IBS",14);gsub("ITU",15);gsub("JPT",16);gsub("KHV",17);gsub("LWK",18);gsub("MSL",19);gsub("MXL",20);gsub("PEL",21);gsub("PJL",22);gsub("PUR",23);gsub("STU",24);gsub("TSI",25);gsub("YRI",26);print}' file.txt

And it works perfectly. What am I doing wrong?

@ChrisLear gave me a working solution, but I still don't quite understand what the command solution is doing. Here's the working code:

$(printf "awk {%sprint}" $(tail -n +2 file.txt | cut -f2 | sort | uniq | awk 'BEGIN{a=1}{printf "gsub(\"%s\",%i);", $1,a++}')) file.txt

The single quotes around {%sprint} are removed. Why do those single quotes break the command substitution?

edit: changed backtick to $(...) notation. Also added solution I don't understand.

BFH
  • 99
  • 7
  • The right way to do `awk '{gsub("ACB",1);gsub("ASW",2)}'` btw is `awk 'BEGIN{split("ACB ASW",m)} {for (i in m) gsub(m[i],i)}'` or similar (depends on your requirements) but that doesn't seem to be related to your question. – Ed Morton Jul 05 '17 at 16:00
  • @EdMorton the command is `printf "awk '{%sprint}'" $(...)` with `%s` referring to `$(...)` – BFH Jul 05 '17 at 16:01
  • Ah, I see. I tried formatting your question properly but you had 3 ticks at the start of your printf line so I guessed you wanted to leave one but as written it just doesn't make sense - please [edit] your question to show the actual command line. Also - add concise, testable sample input and expected output as it's extremely unlikely that what you're doing is the right way to do it (whatever "it" is) and we could help put you on the right path. – Ed Morton Jul 05 '17 at 16:03
  • @EdMorton I used the extra backticks to wrap the long command, but I guess that's not the preferred formatting. – BFH Jul 05 '17 at 16:06
  • https://stackoverflow.com/questions/18567685/why-does-command-substitution-change-how-quoted-arguments-work might be a useful reference – Chris Lear Jul 05 '17 at 16:18

2 Answers2

1

Try removing the quotes from the command being generated.

`printf "awk {%sprint}" $(tail -n +2 file.txt | cut -f2 | sort | uniq | awk 'BEGIN{a=1}{printf "gsub(\"%s\",%i);", $1,a++}')` file.txt

For an explanation, see the accepted answer at Why does command substitution change how quoted arguments work?

Chris Lear
  • 6,592
  • 1
  • 18
  • 26
  • This works, but I'm not quite understanding the reason. I looked at the linked answer and didn't quite understand what's going on. – BFH Jul 05 '17 at 16:38
  • The reason your command doesn't work is that after the command substitution (originally backticks) the resulting command has word splitting applied to it, but the quotes within the replaced command are all treated literally. I found various not-great explanations on the web, and the one I linked to was the one I thought did the best job of clarifying the issue. I doubt I'll be able to re-state it in a clearer way, unfortunately. – Chris Lear Jul 06 '17 at 08:18
0

It looks like you're trying to take a bunch of unique 2nd fields from a file starting at line 2 and map those to numbers based on their alphabetic ordering, then apply the change to the same file. If so then with GNU awk for sorted_in and inplace editing that'd be:

awk -i inplace '
NR==FNR {
    if (NR>1) {
        map[$2]
    }
    next
}
FNR==1 {
    PROCINFO["sorted_in"] = "@ind_str_asc"
    for (str in map) {
        map[str] = ++i
    }
}
{
    $2 = map[$2]
    print
}
' file.txt

If that's not what you need then edit your question to show concise, testable sample input and expected output.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • What I'm trying to do is replace all the strings in a column of a text file with numbers corresponding to each of the unique strings when sorted alphabetically. But my question is a more general one about why the command substitution isn't working. – BFH Jul 05 '17 at 16:17
  • The command line posted in your question doesn't make sense though so we can't help you debug it, it's obviously invalid syntax as written right now due to spurious ticks, probably due to both of us trying to format it for this forum, which is why I asked you to edit it so it actually is the command you're trying to execute. – Ed Morton Jul 05 '17 at 16:20
  • The tics aren't spurious. I'm trying to do a command substitution. I have confirmed that running the code within the tics and then copying the output to the next line and appending the file name does exactly what I want to do.So the question is why the command substitution breaks. I tried this solution and it just deletes the second column – BFH Jul 05 '17 at 16:29
  • But you already know about `$(...)` so why are you using the old-style backticks that they replaced on the outside, i.e why `\`cmd1 $(cmd2)\`` instead of `$(cmd1 $(cmd2))`? Have you tried running shellcheck on your code? – Ed Morton Jul 05 '17 at 16:35
  • That was sloppy of me, but fixing that just produces a different error. @ChrisLear's solution worked, but I'm still not quite understanding the mechanics, and I also don't understand why this solution isn't working. It looks like it should. gawk doesn't recognise `-i inplace` – BFH Jul 05 '17 at 16:50
  • edit your question to show the command line using `$(..)`s only and the error message you get from that so we're not debugging an error that you don't have if you just use the correct command execution syntax. Unfortunately since you haven't provided input we could test with ourselves there's only so much we can do to help you debug it. Then you're running an extremely old version of gawk - check `gawk --version` and then update it, we're now on 4.1.4. – Ed Morton Jul 05 '17 at 16:56
  • I don't have superuser access to the machine I'm using, so I can't upgrade gawk. I've updated the question. The gawk version on the shared resource is 3.1.7. – BFH Jul 05 '17 at 17:16
  • Yeah that's about 5 years out of date. I see you updated your question to show the new execution command but it's still the old error message and you said you get a new error message now so just make sure the error message matches the command you're running. And, again, if you provided sample input then we could be of much more help to you as we could then execute the command you're running and see if we can spot the issue. – Ed Morton Jul 05 '17 at 17:19
  • Yeah. I don't know why awk is so out of date on the supercomputer. I reran the new command and got the old error. I must have mistyped something before. – BFH Jul 05 '17 at 17:20