Suppose I have multi line record with =
as a record separator, but only if the =
is the start of a line:
$ cat file
record 1, field 1
record 1, field 2 with a = in it
record 1, field 3
= record 2, field 1
record 2, field 2 also with a = in it
record 2, field 3
= final record 3, field 1
record 3, field 2
I would like to separate a file similar to this into records delimited by ^=[ \t]
and fields by \n
.
I tried:
$ gawk -v RS="^=[ \t]" -v FS="\n" '{printf "%s\n--- NF=%s, NR=%s ---\n", $0, NF, FNR}' file
but that results in:
record 1, field 1
record 1, field 2 with a = in it
record 1, field 3
= record 2, field 1
record 2, field 2 also with a = in it
record 2, field 3
= final record 3, field 1
record 3, field 2
--- NF=9, NR=1 ---
i.e., the ^
does not work as I expect it to as beginning of the line.
I know I can do:
$ gawk -v RS="\n=[ \t]" -v FS="\n" '{printf "%s\nNF=%s, NR=%s\n", $0, NF, FNR}'
But that feels like it would have Unix / Windows issues with line separators. It also has an extra \n
attached to the final record
I could use sed
to replace the ^=[ \t]
with an extra \n
then use gawk
in paragraph mode:
$ sed 's/^=[ \t]/\
/' file | gawk -v RS="" -v FS="\n" '{printf "%s\n--- NF=%s, NR=%s ---\n", $0, NF, FNR}'
record 1, field 1
record 1, field 2 with a = in it
record 1, field 3
--- NF=3, NR=1 ---
record 2, field 1
record 2, field 2 also with a = in it
record 2, field 3
--- NF=3, NR=2 ---
final record 3, field 1
record 3, field 2
--- NF=2, NR=3 ---
Which is precisely what I am looking for.
Question: Is there a way to use ^
in RS
to indicate 'start of the line' in gawk with multiline records so I don't have to pipe through sed
? I guess I am looking for the equivalent of the m
flag in a PCRE regex in gawk
.