1

Can anybody explain me this small script.

echo -e "\"aa;bb\";cc ;\"dd ;ee\"; 
ff" | awk -v RS=\" -v ORS=\" 'NR%2==0{gsub(";",",")}
{print}'

In this script fields separated by (;), but if there is one or more (;) inside any field then this field is surrounded by "".It's CSV-file.

Therefore it is necessary to replace all (;) in this fields for further parsing.

Andry
  • 121
  • 3
  • 12

1 Answers1

2

The echo prints two lines:

"aa;bb";cc ;"dd ;ee"; 
ff

And splits records with each double quote, and in the even ones replace all semicolons with commas (gsub).

So, first record will be the content just before first double quote, it's a blank record but the important part is the condition NR%2==0. NR is one so the condition will be false, gsub() will not be executed, it will be printed with its ORS so output will be a double quote.

For second record content will be aa;bb, NR%2==0 will be true and will replace the semicolon.

For third record content will be ;cc ;, NR%2==0 will be false and it will be printed.

And so on until end of file.

Birei
  • 35,723
  • 2
  • 77
  • 82
  • Sorry, but I cannot understand why *"For second record content will be aa;bb, NR%2==0 will be true and will replace the semicolon."* and *"For third record content will be ;cc ;, NR%2==0 will be false and it will be printed."* is "TRUE" and "FALSE" respectively. – Andry Jul 12 '13 at 07:59
  • @Andry: Each double quote separates each record, and variable `NR` means number of records read. So `2 % 2 == 0` is true and `3 % 2 == 0` is false. – Birei Jul 12 '13 at 08:08
  • Thanks, friend. Now I understand everything. – Andry Jul 12 '13 at 09:15
  • If you so prof in awk can you get me some help in quotation single quotes inside awk expression. E.g. awk 'BEGIN{RS="\"";ORS="\'"} $1'. This does not work. I use sh as shell on FreeBSD. – Andry Jul 12 '13 at 09:59
  • @Andry: The example of your question does it fine. Escape them with backslashes. It's the same for a `BEGIN` block. It works for me so I don't know in what way it doesn't work for you. – Birei Jul 12 '13 at 10:08
  • My answer awk 'BEGIN{RS="\"";ORS="'\''"} $1' . May be it's safe some time for anybody. When you need use single quotes inside awk use pair single quotes as screen. – Andry Jul 12 '13 at 10:11
  • For me just ORS="\'" doesn not work. I use ORS="'\''". I have found right solution from here http://stackoverflow.com/questions/9899001/how-to-escape-single-quote-in-awk-inside-printf (Kaz comment) – Andry Jul 12 '13 at 10:15