0

I have an output from Unix uniq -c command which prints the number of occurrences of a string at the beginning of each line. The string represents two authors separated by a pipe (e.g., Aabdel-Wahab S|Abdel-Hafeez EH).

  1 Aabdel-Wahab S|Abdel-Hafeez EH
  1 Aabdel-Wahab S|Abdulla AM
  4 Aabdel-Wahab S|Ahmad AK
  1 Aabdel-Wahab S|Mosalem FA
  1 Aabye MG|Andersen AB
  8 Aabye MG|Changalucha J
  1 Aabye MG|Christensen DL
  1 Aabye MG|Faurholt-Jepsen D

I need to grep the occurrence number and move it to the end of the line. For example:

Aabdel-Wahab S|Abdel-Hafeez EH|1
Aabdel-Wahab S|Abdulla AM|1
Aabdel-Wahab S|Ahmad AK|4
Aabdel-Wahab S|Mosalem FA|1
Aabye MG|Andersen AB|1
Aabye MG|Changalucha J|8
Aabye MG|Christensen DL|1
Aabye MG|Faurholt-Jepsen D|1

Please note that frequencies are now pipe delimited. Pasted below is my one-liner in Awk:

awk '{num=$1;$1=""; sub(/^ /,""); print $0,"|",num;}' file

However the Awk add extra spaces around the final pipeline:

Aabdel-Wahab S|Abdel-Hafeez EH | 1
Aabdel-Wahab S|Abdulla AM | 1
Aabdel-Wahab S|Ahmad AK | 4
Aabdel-Wahab S|Mosalem FA | 1
Aabye MG|Andersen AB | 1
Aabye MG|Changalucha J | 8
Aabye MG|Christensen DL | 1
Aabye MG|Faurholt-Jepsen D | 1

Any idea how to proceed (not necessary using Awk)?

αғsнιη
  • 2,627
  • 2
  • 25
  • 38
Andrej
  • 3,719
  • 11
  • 44
  • 73

4 Answers4

2

This is a true case for using sed instead of awk:

sed 's/^  *\([0-9][0-9]*\) *\(.*\)/\2|\1/' file

Regex breakdown:

  • ^ * Start with at least one space
  • \( Start of capturing group one
    • [0-9][0-9]* Match at least one digit
  • \) End of CG one
  • * Any number of space characters
  • \(.*\) Capture rest of input line (CG two)

Replacement string changes order of capturing groups with a single | between.

revo
  • 47,783
  • 14
  • 74
  • 117
2

Awks not adding spaces by itself, you're telling awk to add spaces. What do you think , means in print 1,2 (hint: look up OFS in the awk man page)? Just don't do that:

awk '{num=$1; $1=""; sub(/^ /,""); print $0 "|" num}' file
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
1

You can use printf:

awk '{num=$1;$1=""; sub(/^ /,""); printf("%s|%s\n",$0,num);}' file
llllllllll
  • 16,169
  • 4
  • 31
  • 54
1

Using sed:

sed -r 's/\s*([0-9]+)\s*(.*)/\2|\1/' infile
  • The \s* matches zero-or-more whitespaces.
  • The ([0-9]+) matches one-or-more digits and parentheses used for group match.
  • The (.*) matches anything and parentheses used for group match again here also.
  • In \2|\1, we are printing second group match i.e (.*), next first group match i.e ([0-9]+) with pipe between.

POSIXly, you would do:

sed 's/^ *\([0-9][0-9]*\) *\(.*\)$/\2|\1/' infile
αғsнιη
  • 2,627
  • 2
  • 25
  • 38