Regex with awk or gawk

Question

I'm a beginner user of awk/gawk. If I run below, the shell gives me nothing. Please help!

echo "A=1,B=2,3,C=,D=5,6,E=7,8,9"|awk 'BEGIN{
n = split($0, arr, /,(?=\\w+=)/)
for (x=1; x<n; x++) printf "arr[%d]=%s\n", x, arr[x]
}'

.....................................................

I am trying to parse:

A=1,B=2,3,C=,D=5,6,E=7,8,9

Expected Output:

A=1
B=2,3
C=
D=5,6
E=7,8,9

I bet there's something wrong with my awk.

Good question with self-documenting test case. Keep posting and Good Luck! — shellter, Feb 07 '13 at 22:08

score 4 · Answer 1 · answered Feb 07 '13 at 22:12

4

gawk doesn't support look-ahead.

if you want gawk to parse it as you expected, try this:

awk '{n=split(gensub(/,([A-Z])/, " \\1","g" ),arr," ");for(x=1;x<=n;x++)print arr[x]}'

test with your example:

kent$  echo "A=1,B=2,3,C=,D=5,6,E=7,8,9"|awk '{n=split(gensub(/,([A-Z])/, " \\1","g" ),arr," ");for(x=1;x<=n;x++)print arr[x]}'
A=1
B=2,3
C=
D=5,6
E=7,8,9

answered Feb 07 '13 at 22:12

Kent

189,393
32
233
301

Ooh, I like that. I was trying to figure out how to split, but I didn't think of using `gensub()` and using the backreference to keep the part we want to keep. +1. – steveha Feb 07 '13 at 22:29
@steveha thx. the first came up was `sed -r 's/,([A-Z])/ \1/g'|awk 'simple split..'` but I thought it would be nice to write in single process. – Kent Feb 07 '13 at 22:43

score 3 · Answer 2 · answered Feb 07 '13 at 22:04

3

This might be easier with sed:

$ echo "A=1,B=2,3,C=,D=5,6,E=7,8,9" | sed 's/,\(\w\+=\)/\n\1/g'
A=1
B=2,3
C=
D=5,6
E=7,8,9

answered Feb 07 '13 at 22:04

Andrew Clark

202,379
35
273
306

score 2 · Answer 3 · answered Feb 08 '13 at 17:40

2

If you are using gnu awk, you could do:

awk '{printf $0 "\n" substr( RT, 2 )}' RS=,[A-Z]

answered Feb 08 '13 at 17:40

William Pursell

204,365
48
270
300

I was also thinking of using RS but didn't knew how get that macthed text with RT :) – Mirage Feb 08 '13 at 23:54

score 1 · Answer 4 · answered Feb 07 '13 at 22:04

1

As nhahtdh, theres is no lookahead in awk... But you can use a different separator for the assignments. Why not "A=1;B=2,3,4;C=5..."? If your input must have that format, try flex...

answered Feb 07 '13 at 22:04

Rui Brito

44
3

score 1 · Answer 5 · answered Feb 07 '13 at 23:30

1

You could also use comma as the record separator:

echo "A=1,B=2,3,C=,D=5,6,E=7,8,9" |
awk -v RS=, '{sep=","} /=/ {sep="\n"} NR==1 {sep=""} {printf "%s%s", sep, $0}'

outputs

A=1
B=2,3
C=
D=5,6
E=7,8,9

answered Feb 07 '13 at 23:30

glenn jackman

238,783
38
220
352

steveha · Answer 6 · 2013-02-07T22:33:23.680

You have two problems. First, you don't want a BEGIN clause; you just want this to run on every input line. Second, you are trying to use regular expression features that AWK does not support.

Instead of trying to use a fancy pattern that splits the string, loop and call match() to parse out the features you want.

echo "A=1,B=2,3,C=,D=5,6,E=7,8,9"|awk '
{
    line = $0
    for (i = 0;;)
    {
        i = match(line, /([A-Z]+)=([0-9,]*)(,|$)/, arr)
        if (0 == i)
            break
        key = arr[1]
        value = arr[2]
        l = length(key "=" value ",") + 1
        line = substr(line, l)
        printf "DEBUG: key '%s' value '%s'\n", key, value
    }
}'

This prints:

DEBUG: key A value 1
DEBUG: key B value 2,3
DEBUG: key C value
DEBUG: key D value 5,6
DEBUG: key E value 7,8,9

score 0 · Answer 7 · answered Feb 08 '13 at 03:04

0

Other way using awk

awk '{print gensub(/,([A-Z]+=)/, "\n\\1","g")}' temp.txt

Output

A=1
B=2,3
C=
D=5,6
E=7,8,9

answered Feb 08 '13 at 03:04

Mirage

30,868
62
166
261

Regex with awk or gawk

7 Answers7