7

Is there a way in awk—gawk most likely—to set the record separator RS to empty value to process each character of a string as a separate record? Kind of like setting the FS to empty to separate each character in its own field:

$ echo abc | awk -F '' '{print $2}'
b

but to separate them each as a separate record, like:

$ echo abc | awk -v RS='?' '{print $0}'
a
b
c

The most obvious one:

$ echo abc | awk -v RS=''  '{print $0}'
abc

didn't award me (as that one was apparently meant for something else per GNU awk documentation).

Am I basically stuck using for etc.?

EDIT:

@xhienne's answer was what I was looking for but even using that (20 chars and a questionable variable A :):

$ echo  abc | awk -v A="\n" -v RS='(.)' -v ORS="" '{print(RT==A?NR:RT)}'
abc4

wouldn't help me shorten my earlier code using length. Then again, how could I win the Pyth code: +Qfql+Q :D.

Community
  • 1
  • 1
James Brown
  • 36,089
  • 7
  • 43
  • 59

3 Answers3

5

If you just want to print one character per line, @klashxx's answer is OK. But a sed 's/./&\n/g' would be shorter since you are golfing.

If you truly want a separate record for each character, the best approaching solution I have found for you is:

echo -n abc | awk -v RS='(.)' '{ print RT }'

(use gawk; your input character is in RT, not $1)

[update] If RS is set to the null string, it means to awk that records are separated by blank lines. If I had just defined RS='.', the record separator would have been a mere dot (i.e. a fixed string). But if its length is more than one character, one feature of gawk is to consider RS as a regex. So, what I did here is to give gawk a regex meaning "each character" as a record separator. And I use another feature of gawk: to retrieve the string that matched the regex in the special variable RT (record terminator)

Here is the relevant parts of the gwak manual:

Normally, records are separated by newline characters. You can control how records are separated by assigning values to the built-in variable RS. If RS is any single character, that character separates records. Otherwise, RS is a regular expression. Text in the input that matches this regular expression separates the record.

If RS is set to the null string, then records are separated by blank lines.

Gawk sets RT to the input text that matched the character or regular expression specified by RS.

xhienne
  • 5,738
  • 1
  • 15
  • 34
  • xhienne, Thanks a lot for this code sharing. Just wanted to mention here it gives 2 new lines at last of output. – RavinderSingh13 Dec 23 '16 at 13:11
  • 1
    @RavinderSingh That's the ending newline, which is a separate record. Changing `echo` to `echo -n` – xhienne Dec 23 '16 at 13:13
  • This is pretty good, just ran to `RT` myself in the documentation. I probably could live with this one but can't say it aloud, otherwise I won't get any other answers (_did I say that?!_). I must get on bottom of this. – James Brown Dec 23 '16 at 13:39
  • @EdMorton POSIX is not the point here since it was stated explicitly in the OP that `gawk` was an option. Keep in mind the final purpose of this question (codegolf). As for you assumption on my understanding of the parentheses in `(.)`, the whole purpose was to force `RS` to be considered as a regex by `gawk`. Now that you know, do not just say I'm wrong, prove it. – xhienne Dec 23 '16 at 16:10
  • @EdMorton First, you merely stated that `echo -n` was not POSIX compliant, nothing more. But I was not commenting on this. I was referring to the first part of your comment, something as presumptuous as "Whatever you think your parentheses are meant to do, you are wrong. **They only make your command non-POSIX compliant**" (IIRC). That's why I'm asking you for evidences to support your claim that I don't understand my own answer (and that you understand it better than I am). – xhienne Dec 23 '16 at 19:59
  • @EdMorton Your (deleted) comment started as I stated, and that explains perfectly my first answer to you. I understand now that your unpleasant wording was your (awkward) way to say "the parens are unnecessary", as your denial here is probably your (awkward) way to say you are sorry for that. So be it. Let's forget this and move to something else. :-) – xhienne Dec 23 '16 at 22:27
3

It is not possible

The empty string "" (a string without any characters) has a special meaning as the value of RS. It means that records are separated by one or more blank lines and nothing else.

A simply alternative:

echo abc | awk  'BEGIN{FS="";OFS="\n"}$1=$1'
Juan Diego Godoy Robles
  • 14,447
  • 2
  • 38
  • 52
  • 1
    Under normal circumstances that would suffice but I've gone [golfing](http://codegolf.stackexchange.com/) – James Brown Dec 23 '16 at 11:12
  • 1
    I see : ) @JamesBrown – Juan Diego Godoy Robles Dec 23 '16 at 11:30
  • 1
    @JamesBrown This does not answer your question as it does not make a **separate record** with each character. It just prints one character on a separate line. NR is still 1 in the end. Is that really what you want? – xhienne Dec 23 '16 at 12:31
  • 1
    You should mention that relies on undefined behavior (per POSIX) and so will only work on some awks, and that it'll fail if the first character on a line is `0` (zero) since it relies on the value of `$1` to invoke the default behavior of printing the current record (try `echo '0bc' | ...`), – Ed Morton Dec 23 '16 at 21:22
3

No there is no setting of RS that will do what you want. It looks like your requirement is to append a newline after every character that is not a newline, if so this will produce the output you want:

$ echo 'abc' | awk -v ORS= 'gsub(/[^\n]/,"&\n")'
a
b
c

That will work on any awk on any UNIX system.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185