0

I need to parse stdin in the following way:

(1) all newlines characters must be substituted with \n (a literal \ followed by n)

(2) nothing else should be performed except the previous

I chose awk to do it, and I would like an answer that uses awk if possible.

I came up with:

echo -ne "A\nB\nC" | awk '{a[NR]=$0;} END{for(i=1;i<NR;i++){printf "%s\\n",a[i];};printf "%s",a[NR];}'

But it looks cumbersome.

Is there a better / cleaner way?

robertspierre
  • 3,218
  • 2
  • 31
  • 46
  • `echo -ne "A\nB\nC" | sed -z 's/\n/\\n/g'`? – Cyrus Jan 22 '23 at 09:37
  • 2
    `awk 'BEGIN{ORS="\\n"}1'`? – Cyrus Jan 22 '23 at 09:39
  • @Cyrus The `sed` solution works, thanks. The `awk` one doesn't, as it adds a trailing `\n`. – robertspierre Jan 22 '23 at 09:44
  • 1
    `echo -ne "A\nB\nC"` outputs a string without a terminating newline which means it's not a valid text file and so YMMV with using any text processing tool on it. – Ed Morton Jan 22 '23 at 15:02
  • @glennjackman no the last character of the string is `C`, not a new line – robertspierre Jan 22 '23 at 19:59
  • If you just want to create a string that is readable by any shell, you could just use `printf "%q"`. Example `printf "%q" "$(echo -ne "A\nB\nC")"` – kvantour Jan 23 '23 at 11:14
  • @EdMorton I don't know whether there is some sort of standard for text files. I have read it is good practice for it to end with a newline, but I don't know whether that's coded somewhere. Anyway, that is the input we have. We can complain about it all we want: it's not going to change. – robertspierre Jan 23 '23 at 11:33
  • @kvantour I don't understand what you mean by "readable by any shell" – robertspierre Jan 23 '23 at 11:33
  • A string that can be passed on the command line interface (such as the `$'A\nB\nC'` or `$'A\r\bB'` or `$'A\tB'`) ( See https://www.gnu.org/software/bash/manual/html_node/ANSI_002dC-Quoting.html ) – kvantour Jan 23 '23 at 11:40
  • @robertspierre the standard for text files is at https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_403 and for text lines (which a text file is made up of) is at https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_206. The standards for text processing tools like awk, sed, grep, etc. all say some variation of "Input files shall be text files" (e.g. under INPUT FILES at https://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html) so when you don't have that then YMMV. – Ed Morton Jan 23 '23 at 13:32
  • Although you have data that doesn't end in a terminating newline and so isn't a valid text file, that doesn't mean that's what you need to use as input to a text processing tool, you could add a newline, e.g. `{ cat file; printf '\n'; } | awk '...'` or various other approaches. In practice I expect most versions of awk and sed to be able to handle no terminating newline (not sure about grep) in some fashion but other tools will surprise you. – Ed Morton Jan 23 '23 at 14:02
  • @Cyrus : if you really want `awk 'BEGIN{ORS="\\n"}1'` then might as well ::::::::::::::::::::::::::::::::::::: `awk 'ORS = "\\n"'` or `awk 7 ORS='\\n'` – RARE Kpop Manifesto Jan 27 '23 at 23:14
  • @robertspierre : the really really amateur-hour workaround I have in my own codes for this is something like : `gcat - <( gprintf %s\\n '|_fakeEOFfake_|' )` - using `cat` to append an artificial `"EOF"-ish` text string, regardless of actual ending, and have the `awk` code properly remove its effects afterwards. i know i know - bad coding style and all. – RARE Kpop Manifesto Jan 27 '23 at 23:21

5 Answers5

2
  • Handling malformed files (ie. that don't end with the record separator) with awk is tricky.

  • sed -z is GNU specific, and has the side effect of slurping the whole (text) file into RAM (that might be an issue for huge files)

Thus, for a robust and reasonably portable solution I would use perl:

perl -pe 's/\n/\\n/'
Fravadona
  • 13,917
  • 1
  • 23
  • 35
2

With awk:

echo -ne "A\nB\nC" | awk 'BEGIN{FS="\n"; OFS="\\n"; RS=ORS=""} {$1=$1}1'

Output:

A\nB\nC

See: 8 Powerful Awk Built-in Variables – FS, OFS, RS, ORS, NR, NF, FILENAME, FNR

Cyrus
  • 84,225
  • 14
  • 89
  • 153
2

I would harness GNU AWK for this task following way

echo -ne "A\nB\nC" | awk '{printf "%s%s",$0,RT?"\\n":""}'

gives output

A\nB\nC

(without trailing newline)

Explanation: I do create string to be output based on current line context ($0) and backslash followed by n or empty string depending on RT which is row terminator for current line. RT value is newline for all but last lines and empty string for last line, therefore when used in boolean context it is true for all but last line. I used so-called ternary operator here condition?valueiftrue:valueiffalse.

(tested in GNU Awk 5.0.1)

Daweo
  • 31,313
  • 3
  • 12
  • 25
0

Using GNU awk for multi-char RS:

$ echo -ne "A\nB\n\nC" | awk -v RS='^$' -v ORS= -F'\n' -v OFS='\\n' '{$1=$1} 1'
A\nB\n\nC$
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
-1

this should solve the blank line in between problem :

gecho -ne "A\nB\n\nC" | 
{m,g,n}awk 'BEGIN {  RS = "^$" ; FS = "\n" 
                    ORS =  "" ; OFS = "\\n" } NF = NF' | gcat -b
     1  A\nB\n\nC%   

a gawk-specific way via RT :

 gawk 'BEGIN { _ = ""; ORS =__= "\\n" } (ORS = RT ? __ : _)^_'
RARE Kpop Manifesto
  • 2,453
  • 3
  • 11