2

I am tasked with taking a file that has line entries that include string username=xxxx:

$ cat file.txt
Yadayada username=jdoe blablabla
Yadayada username=jdoe blablabla
Yadayada username=jdoe blablabla
Yadayada username=dsmith blablabla
Yadayada username=dsmith blablabla
Yadayada username=sjones blablabla

And finding how many times each user in the file shows up, which I can do manually by feeding username=jdoe for example:

$ grep -r "username=jdoe" file.txt  | wc -l | tr -d ' '
3

What's the best way to report each user in the file, and the number of lines for each user, sorted from highest to lowest instances:

3    jdoe
2    dsmith
1    sjones

Been thinking of how to approach this, but drawing blanks, figured I'd check with our gurus on this forum. :)

TIA, Don

6 Answers6

0

Using sed, uniq, and sort:

sed 's/.*username=\([^ ]*\).*/\1/' file.txt | sort | uniq -c | sort -nr

If there are lines without usernames:

sed -n 's/.*username=\([^ ]*\).*/\1/p' input | sort | uniq -c | sort -nr
perreal
  • 94,503
  • 21
  • 155
  • 181
  • This will break if the users do not appear in order. You can verify by moving first line to last line and try again. – Gautam May 17 '18 at 06:08
  • Awesome feedback from so many people, I wish I had beer for you all!!! This works: `sed -n 's/.*username=\([^ ]*\).*/\1/p' input | sort | uniq -c | sort -nr` but I get a single line return: `3 jdoe 2 dsmith 1 sjones`. This might be all we need, but wondered if there is a way to preserve the carriage returns? – donmontalvo May 18 '18 at 13:58
  • The output is coming from sort, there has to be newline. How are you running the command, from the terminal? What is your OS? – perreal May 18 '18 at 14:07
  • @perreal I added the line to our script. My apologies, I need to tattoo "Always include OS info on stackOverflow posts!" to my forehead. :) This is on macOS (High Sierra 10.13.4). – donmontalvo May 18 '18 at 18:02
0

In GNU awk:

$ awk '
BEGIN { RS="[ \n]" }
/=/ {
    split($0,a,"=")
    u[a[2]]++ }
END {
    PROCINFO["sorted_in"]="@val_num_desc"
    for(i in u)
        print u[i],i
}' file
3 jdoe
2 dsmith
1 sjones
James Brown
  • 36,089
  • 7
  • 43
  • 59
  • 1
    Don's an Apply guy. His Macs come with BSD awk, I suspect. :) (Ya I know, he didn't specify. Thank you for specifying.) – ghoti May 17 '18 at 06:22
  • How does Apple's awk feel about `RS="[ \n]"`? Sincerely, oranges. :D – James Brown May 17 '18 at 06:51
  • 1
    Multi-char RS is gawk-specific. Well, **maybe** mawk too these days as mawks been adopting gawk functionality lately. But that's all. Ditto for PROCINFO and sorted_in. Btw I've noticed you using a string as the 3rd arg to split() in several scripts recently - the 3rd arg to split() is a regexp, not a string, so you should use regexp delimiters so awk doesn';t have to convert it from a string to a regexp before using it. – Ed Morton May 17 '18 at 12:32
  • Quick and dirty. I stand corrected. And don't get me started on coffee, can't have that anymore. Probably related to recent sloppiness. – James Brown May 17 '18 at 12:59
0

Using grep :

$ grep -o 'username=[^ ]*' file | cut -d "=" -f 2 | sort | uniq -c | sort -nr
Gautam
  • 1,862
  • 9
  • 16
0

Awk alone:

awk '
  {sub(/.*username=/,""); sub(/ .*/,"")}
  {a[$0]++}
  END {for(i in a) printf "%d\t%s\n",a[i],i | "sort -nr"}
' file.txt

This uses awk's sub() function to achieve what grep -o does in other answers. It embeds the call to sort within the awk script. You could of course use that pipe after the awk script rather than within it if you prefer.

Oh, and unlike the other awk solutions presented here, this one (1) is portable to non-GNU-awk environments (like BSD, macOS) and doesn't depend on the username being in a predictable location on each line (i.e. $2).

Why might awk be a better choice than simpler tools like uniq? It probably wouldn't, for a super simple requirement like this. But good to have in your toolbox if you want something with the capability of a little more text processing.

ghoti
  • 45,319
  • 8
  • 65
  • 104
0
$ awk -F'[= ]' '{print $3}' file | sort | uniq -c | sort -nr
      3 jdoe
      2 dsmith
      1 sjones
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
0

Following awk may help you on same too.

awk -F"[ =]" '{a[$3]++} END{for(i in a){print a[i],i | "sort -nr"}}'  Input_file
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93