One of the columns in my file is url encoded, I have to decode that column and need to perform some operations based on values inside the column. Is there any way I can decode that column in awk?
Asked
Active
Viewed 3,948 times
4
-
2this question is NOT a duplicate of the one mentioned, the title of which is wrong. – Walter Tross Oct 02 '13 at 14:01
-
2This is not a duplicate. This question's answer solved a problem for me that the other one did not. – C. Ross Aug 27 '14 at 17:29
1 Answers
6
You have to adapt it depending your file format, but the basic principle is here (tested with GNU Awk 3.1.7):
sh$ echo 'Hello%2C%20world%20%21' | awk '
{
for (i = 0x20; i < 0x40; ++i) {
repl = sprintf("%c", i);
if ((repl == "&") || (repl == "\\"))
repl = "\\" repl;
gsub(sprintf("%%%02X", i), repl);
gsub(sprintf("%%%02x", i), repl);
}
print
}
'
Hello, world !
If you have gawk
, you can wrap that in a function (credit to brendanh in a comment below):
function urlDecode(url) {
for (i = 0x20; i < 0x40; ++i) {
repl = sprintf("%c", i);
if ((repl == "&") || (repl == "\\")) {
repl = "\\" repl;
}
url = gensub(sprintf("%%%02X", i), repl, "g", url);
url = gensub(sprintf("%%%02x", i), repl, "g", url);
}
return url;
}

Community
- 1
- 1

Sylvain Leroux
- 50,096
- 7
- 103
- 125
-
My string is like this: 'http%3a%2f%2fwww.gazelle.com%2fiphone%2fiphone-3g' the above operation couldn't decode this string..:( – MikA Jun 08 '13 at 19:48
-
Obviously, I used the format '%02X' which match URL encoded with percent-sign in _uppercase_ like `http%3A%2F...` I modified the sample code to convert lower-case percent-encoding too. Now it should works with both ... at least up to `%40` (upper limit of the for loop). You might have to adjust that... – Sylvain Leroux Jun 08 '13 at 20:03
-
My String is like this: 1370474740&http%3a%2f%2fwww.xxxx.com%2fiphone%2fiphone-3g&et%3da%26ago%3d212%26ao%3d219%26px%3d73%26av1%3d2%26av2%3dOrganicSearch&13456 when i use awk like this: awk 'BEGIN {FS = "&"} {for (i = 0x20; i < 0x40; ++i) gsub(sprintf("%%%02x", i), sprintf("%c", i));print $1,$2,$3}' '%26' which is '&' is not getting converted, why? – MikA Jun 08 '13 at 20:33
-
This one was tough! I wasn't remembering that `&` and `\ ` have special meaning in the replacement string for `gsub`. It is fixed in the answer (I hope) – Sylvain Leroux Jun 08 '13 at 21:26
-
-
FWIW a slightly modified gawk-only version of the answer, as a function: ``` function urlDecode(url) { for (i = 0x20; i < 0x40; ++i) { repl = sprintf("%c", i); if ((repl == "&") || (repl == "\\")) { repl = "\\" repl; } url = gensub(sprintf("%%%02X", i), repl, "g", url); url = gensub(sprintf("%%%02x", i), repl, "g", url); } return url; } ``` – brendanh Nov 15 '14 at 23:15
-
@brendanh I took the liberty to add your function in my answer. If you do not agree with that, please feel free to revert that edit. – Sylvain Leroux Nov 16 '14 at 10:43
-
1While this function works, it's quite slow, i found a much much faster one here https://github.com/Knorkebrot/werc/blob/master/bin/contrib/urldecode.awk – Ruslan Talpa Jan 14 '15 at 08:24