10

I'm using awk to urldecode some text.

If I code the string into the printf statement like printf "%s", "\x3D" it correctly outputs =. The same if I have the whole escaped string as a variable.

However, if I only have the 3D, how can I append the \x so printf will print the = and not \x3D?

I'm using busybox awk 1.4.2 and the ash shell.

Adrian Frühwirth
  • 42,970
  • 10
  • 60
  • 71
Johan
  • 405
  • 5
  • 14

6 Answers6

4

I don't know how you do this in awk, but it's trivial in perl:

echo "http://example.com/?q=foo%3Dbar" | 
    perl -pe 's/\+/ /g; s/%([0-9a-f]{2})/chr(hex($1))/eig'
zwol
  • 135,547
  • 38
  • 252
  • 361
  • 1
    Thanks, but perl isn't available. – Johan Sep 16 '10 at 15:27
  • @zwol This only works on Perl 5 if you escape the `+` with a backslash! BTW, works fine for me with sample URLs without the `s/\+/ /g` part at all! The second regex alone will do the trick already. – syntaxerror Jun 27 '15 at 13:15
  • @syntaxerror You're quite right about the `+` needing to be escaped, don't know how I missed that. I think the `?q=phrase+separated+by+plus+signs` notation has gotten less common since I wrote this but it's still part of the [spec for application/x-www-form-urlencoded](http://www.w3.org/TR/html401/interact/forms.html#h-17.13.4) escaping of form submissions. – zwol Jun 27 '15 at 13:29
  • Oh, you're right, I forgot about those form submissions. However, since my main aim is fixing "garbled" download links, the most important thing is to get rid of all this `%20`, `%3D` and `%3F` (et al) stuff in the first place. – syntaxerror Jun 27 '15 at 13:36
3

GNU awk

#!/usr/bin/awk -fn
@include "ord"
BEGIN {
  RS = "%.."
}
{
  printf RT ? $0 chr("0x" substr(RT, 2)) : $0
}

Or

#!/bin/sh
awk -niord '{printf RT?$0chr("0x"substr(RT,2)):$0}' RS=%..

Decoding URL encoding (percent encoding)

Community
  • 1
  • 1
Zombo
  • 1
  • 62
  • 391
  • 407
2

Since you're using ash and Perl isn't available, I'm assuming that you may not have gawk.

For me, using gawk or busybox awk, your second example works the same as the first (I get "=" from both) unless I use the --posix option (in which case I get "x3D" for both).

If I use --non-decimal-data or --traditional with gawk I get "=".

What version of AWK are you using (awk, nawk, gawk, busybox - and version number)?

Edit:

You can coerce the variable's string value into a numeric one by adding zero:

~/busybox/awk 'BEGIN { string="3D"; pre="0x"; hex=pre string; printf "%c", hex+0}'
Dennis Williamson
  • 346,391
  • 90
  • 374
  • 439
  • You'r right, it does work. I asked the wrong question - I'll amend it. (I'm using busybox awk, version 1.4.2) – Johan Sep 17 '10 at 08:49
  • Took me quite awhile to realize this one-liner is for __one__ variable only, no whole urlencoded string (e. g. a web address filled up with `%20` and `%3F` stuff) – syntaxerror Jun 27 '15 at 13:04
1

This relies on gnu awk's extension of the split function, but this works:

gawk '{ numElems = split($0, arr, /%../, seps);
        outStr = ""
        for (i = 1; i <= numElems - 1; i++) {
            outStr = outStr arr[i]
            outStr = outStr sprintf("%c", strtonum("0x" substr(seps[i],2)))
        }
        outStr = outStr arr[i]
        print outStr
      }'
Joel Jones
  • 49
  • 3
1

To start with, I'm aware this is an old question, but none of the answers worked for me (restricted to busybox awk)

Two options. To parse stdin:

awk '{for (y=0;y<127;y++) if (y!=37) gsub(sprintf("%%%02x|%%%02X",y,y), y==38 ? "\\&" : sprintf("%c", y));gsub(/%25/, "%");print}'

To take a command line parameter:

awk 'BEGIN {for (y=0;y<127;y++) if (y!=37) gsub(sprintf("%%%02x|%%%02X",y,y), y==38 ? "\\&" : sprintf("%c", y), ARGV[1]);gsub(/%25/, "%", ARGV[1]);print ARGV[1]}' parameter

Have to do %25 last because otherwise strings like %253D get double-parsed, which shouldn't happen.

The inline check for y==38 is because gsub treats & as a special character unless you backslash it.

Whinger
  • 79
  • 5
0

This one is the fastest of them all by a large margin and it doesn't need gawk:

#!/usr/bin/mawk -f

function decode_url(url,            dec, tmp, pre, mid, rep) {
    tmp = url
    while (match(tmp, /\%[0-9a-zA-Z][0-9a-zA-Z]/)) {
        pre = substr(tmp, 1, RSTART - 1)
        mid = substr(tmp, RSTART + 1, RLENGTH - 1)
        rep = sprintf("%c", ("0x" mid) + 0)
        dec = dec pre rep
        tmp = substr(tmp, RSTART + RLENGTH)
    }
    return dec tmp
}

{
    print decode_url($0)
}

Save it as decode_url.awk and use it like you normally would. E.g:

$ ./decode_url.awk <<< 'Hello%2C%20world%20%21'
Hello, world !

But if you want an even faster version:

#!/usr/bin/mawk -f

function gen_url_decode_array(      i, n, c) {
    delete decodeArray
    for (i = 32; i < 64; ++i) {
        c = sprintf("%c", i)
        n = sprintf("%%%02X", i)
        decodeArray[n] = c
        decodeArray[tolower(n)] = c
    }
}

function decode_url(url,            dec, tmp, pre, mid, rep) {
    tmp = url
    while (match(tmp, /\%[0-9a-zA-Z][0-9a-zA-Z]/)) {
        pre = substr(tmp, 1, RSTART - 1)
        mid = substr(tmp, RSTART, RLENGTH)
        rep = decodeArray[mid]
        dec = dec pre rep
        tmp = substr(tmp, RSTART + RLENGTH)
    }
    return dec tmp
}

BEGIN {
    gen_url_decode_array()
}

{
    print decode_url($0)
}

Other interpreters than mawk should have no problem with them.