-1

I'll often end up, by way of gnu parallel, with a large file containing counts of various objects:

1201 object1
804 object1
327 object2
3828 object1
29 object2
277 object3
...

This'll often have several thousand lines with various objects in no particular order. I'll want a sum of the total counts of each object. My usual approach is to put together a Perl one-liner like this:

perl -lane '$O{$F[1]} += $F[0]; END {foreach $k (keys %O) {print "$k: $O{$k}"}}' countsfile

I'll typically have a pipeline consisting of parallel, awk, grep, sort, uniq, cut, etc. with fairly terse arguments each. The perl hack is an exception: it's long to type and much more complex than other parts of the pipeline. I always feel like I'm specifying far more than I really need to when typing it.

So my question: is there a technique or utility that'll let me do this without having a compose a full script every time? I'd like to be able to do this without using perl, awk, R, or other systems that implement general-purpose languages.

Allen Luce
  • 7,859
  • 3
  • 40
  • 53
  • See also: http://stackoverflow.com/q/10286522/1258041 and http://superuser.com/q/136821/126558 – Lev Levitsky Jun 22 '15 at 22:42
  • 3
    Why not store the script in a file and simply execute the script? – ShellFish Jun 22 '15 at 22:55
  • 1
    I believe the standard solution here is an awk or perl one-liner. Me, I got tired of those a long time ago, and wrote a "math editor" tool that lets me easily invoke things like `med 'sum(c1)'` which simply sums column 1, but as this isn't a standard utility it's not an answer to your question. – Steve Summit Jun 22 '15 at 22:56
  • "Standard" isnt' such a big deal -- have you published it anywhere? – Allen Luce Jun 23 '15 at 00:36
  • It's at https://www.eskimo.com/~scs/src/#med , but I'm afraid it still won't help you, because I misread the question you posed, which ends up being a notch more complicated than med can handle. – Steve Summit Jun 23 '15 at 03:14
  • @ShellFish That's what I usually end up doing. I just figured that need oughta be common enough that someone's already wrapped up a neat little utility for it. – Allen Luce Jun 23 '15 at 03:51
  • Instead of gnu parallel, have you considered running the whole thing as parallel perl? That way you probably don't need a long pipe. – Sobrique Jun 23 '15 at 06:19
  • I intend to keep the pipe construction for this particular class of tasks. It's quicker and more convenient for me to do so. – Allen Luce Jun 23 '15 at 08:59
  • Sorry but this is too specialized and WAY too trivial to solve in a tiny awk or perl script for there to be any unique utility to do it. You don't need all of those other commands you are piping together if you are using awk. One small, simple awk script would almost certainly do everything you want. – Ed Morton Jun 23 '15 at 15:59

2 Answers2

2

Most of your code is in the END block to display the hash. You can make that much more concise using while instead of for

perl -lanE '$O{$F[1]} += $F[0]; END {say "@v" while @v = each %O}}' countsfile

output

object1 5833
object3 277
object2 356

or, if you're inclined to install Data::Dump you can lose the loop altogether

perl -MData::Dump -lanE '$O{$F[1]} += $F[0]; END {dd \%O}' countsfile

output

{ object1 => 5833, object2 => 356, object3 => 277 }

You can even mess with the syntax to avoid the need for the END block

perl -lanE '$O{$F[1]} += $F[0];}{say "@v" while @v = each %O' countsfile
Borodin
  • 126,100
  • 9
  • 70
  • 144
  • I'm looking to avoid this sort of construction, and am looking for a more specialized tool than perl. – Allen Luce Jun 23 '15 at 09:00
  • @DougLuce: Then I'm afraid your question is unclear. You've been asked why you don't simply put the Perl code into a program file. You didn't respond to that, and I can't imagine a more specialised tool. At the same time you say you *“intend to keep the pipe construction”*, which seems to go in the opposite direction. The only other answer you have is very similar to my own but using awk in place of Perl, so it isn't just me being dumb – Borodin Jun 23 '15 at 09:08
  • 1
    @DougLuce Why would anyone have created a utility for this, when it's easily doable with less than a line of code ? – 123 Jun 23 '15 at 11:22
  • See Steve Summit's comment on my question. I'm tired of cranking out very similar one-liner scripts for this kind of situation. My question is not "should I put my script into a file and run it in various places" nor are answers and comments to this effect particularly helpful. – Allen Luce Jun 25 '15 at 22:04
  • The while in the end clause as an idiom is nice, thanks for that. I just used it in yet another instance that the utility I fantasize about should tackle: `perl -alne '$x{$F[0]} .= $F[1] . " "; END {print "@v" while @v = each %x}'` – Allen Luce Jun 25 '15 at 22:05
0

With awk:

awk '{sum[$2]+=$1}END{for(i in sum)print i,sum[i]}' File

With the second word (objectx) as index, update sum array. That is add up the first field (number) to sum[objectx]. At the end, print each index and the element at that index (which will be the sum).

Sample:

AMD$ awk '{sum[$2]+=$1}END{for(i in sum)print i,sum[i]}' File
object1 5833
object2 356
object3 277
Arjun Mathew Dan
  • 5,240
  • 1
  • 16
  • 27
  • This really isn't an improvement over what I'm already doing. I want to avoid having to write in perl or awk or anything that requires this level of complexity. So I'm looking for something specialized. – Allen Luce Jun 23 '15 at 09:02
  • Perhaps u can define an alias and use it... alias mycmd="awk '{sum[\$2]+=\$1}END{for(i in sum)print i,sum[i]}'" . U can also add this to your .bashrc. Then use it like mycmd File... sorry, i dont know about any other simpler utility as such.. – Arjun Mathew Dan Jun 23 '15 at 09:24