5

I need to reorder the columns of this (tab-separated) data:

   1 cat    plays
   1 dog    eats
   1 horse  runs
   1 red    dog
   1 the    cat
   1 the    cat

so that is prints like:

cat plays   1
dog eats    1
horse   runs    1
red dog 1
the cat 2

i have tried:

sort [input] | uniq -c | awk '{print $2 "\t" $3 "\t" $1}' > [output]

and the result is:

1   cat 1
1   dog 1
1   horse   1
1   red 1
2   the 1

Can anyone give me some insight on what is going wrong? Thank you.

owwoow14
  • 1,694
  • 8
  • 28
  • 43

4 Answers4

8

Since the output of cat input | sort | uniq -c is:

   1    1 cat    plays
   1    1 dog    eats
   1    1 horse  runs
   1    1 red    dog
   2    1 the    cat

you need something like:

cat input | sort | uniq -c | awk '{print $3 "\t" $4 "\t" $1}'

And we can also indicate the output field separator in awk:

cat input | sort | uniq -c | awk -v OFS="\t" '{print $3,$4,$1}'
SergioAraujo
  • 11,069
  • 3
  • 50
  • 40
pNre
  • 5,376
  • 2
  • 22
  • 27
  • Note that if you have spaces in your values, `awk` will split on them. To avoid that, and restrict separators to tabs only, pass this extra argument: `awk -F $'\t' ...` – Mathieu Rey Mar 17 '20 at 02:40
3

uniq -c adds an extra column. This should give you the output you want:

$ sort file | uniq -c | awk '{print $3 "\t" $4 "\t" $1}'
cat     plays   1
dog     eats    1
horse   runs    1
red     dog     1
the     cat     2
user000001
  • 32,226
  • 12
  • 81
  • 108
2

With awk and sort:

$ awk '{a[$2 OFS $3]++}END{for(k in a)print k,a[k]}' OFS='\t' file | sort -nk3 
cat     plays   1
dog     eats    1
horse   runs    1
red     dog     1
the     cat     2
Chris Seymour
  • 83,387
  • 30
  • 160
  • 202
2

If you have GNU awk (gawk), you could use only it and its feature function asorti():

#!/usr/bin/env gawk -f
{
    a[$2 "\t" $3]++
}
END {
    asorti(a, b)
    for (i = 1; i in b; ++i) print b[i] "\t" a[b[i]]
}

One line:

gawk '{++a[$2"\t"$3]}END{asorti(a,b);for(i=1;i in b;++i)print b[i]"\t"a[b[i]]}' file

Output:

cat plays   1
dog eats    1
horse   runs    1
red dog 1
the cat 2

UPDATE: To preserve original order without sorting use:

#!/usr/bin/awk -f
!a[$2 "\t" $3]++ {
    b[++i] = $2 "\t" $3
}
END {
    for (j = 1; j <= i; ++j) print b[j] "\t" a[b[j]]
}

Or

awk '!a[$2"\t"$3]++{b[++i]=$2"\t"$3}END{for(j=1;j<=i;++j)print b[j]"\t"a[b[j]]}' file

Any awk version would be compatible with that this time.

Output should be the same this time since input is already sorted by default.

konsolebox
  • 72,135
  • 12
  • 99
  • 105