6

I have many files with results of command: uniq -c some_file > some_file.out

For example: 1.out:

 1 a
 2 b
 4 c

2.out

 2 b
 8 c

I would like to merge these results, so I get:

 1 a
 4 b
 12 c

I thought that sort or uniq could handle it but I don't see any option related to it. Writing some ruby/perl script is one of way to go but I'd like to do it easly with core *nix commands (like mentioned sort and uniq).

Edit: To be clear. I don't have original files and I have to merge *.out files.

Thanks for help!

radarek
  • 2,478
  • 2
  • 17
  • 12

4 Answers4

5

Try it with awk:

awk '{ count[$2] += $1 } END { for(elem in count) print count[elem], elem }' 1.out 2.out 
Philipp
  • 48,066
  • 12
  • 84
  • 109
  • Ok, it should work for me. It's not ideal because I expect to do it with O(N) memory usage, where N is number of files but it will work for some time (unless I have big results). Thanks! – radarek Sep 25 '09 at 10:12
  • I don't think it's linear in the number of files because `awk` reads all files in sequence, one line at a time, and it only needs to keep the `count` array (hash table?) in memory. – Philipp Sep 25 '09 at 11:02
  • I didn't say that solution given by Philipp is linear. I said that it can be written such a solution. – radarek Sep 25 '09 at 11:05
0

It's quite a specific problem, so it's unlikely any tool will do this by default. You can script it in a small enough loop (no need for awk nastyness), implemented in any scripting language (even sh). I don't think there's another way.

wds
  • 31,873
  • 11
  • 59
  • 84
0

This is not quite serious (but it works). I like Philipps solution.

cat 1.out 2.out |
{
    while read line; do
        for i in $(seq ${line% *}); do
            echo ${line#* }
        done
    done
} | sort | uniq -c
andre-r
  • 2,685
  • 19
  • 23
0

The accepted answer works for the specific values provided in the question. If the output of uniq -c contains more spaces than just the one between the count and the value however, the following awk script does not truncate output after the second field:

awk '{ cnt=$1; $1=""; count[substr($0, 2)] += cnt } END { for(elem in count) print count[elem], elem }' 1.out 2.out
Daniel Beck
  • 6,363
  • 3
  • 35
  • 42