Word count and it output

Question

I have the following lines:

123;123;#rss
123;123;#site #design #rss
123;123;#rss
123;123;#rss
123;123;#site #design

and need to count how many times each tag appears, do the following:

grep -Eo '#[a-z].*' ./1.txt | tr "\ " "\n" | uniq -c

i.e. first select only the tags from the strings, and then break them down and count it.

output:

   1 #rss
   1 #site
   1 #design
   3 #rss
   1 #site
   1 #design

instead of the expected:

   2 #site
   4 #rss
   2 #design

It seems that the problem is in the non-printable characters, which makes counting incorrect. Or is it something else? Can anyone suggest a correct solution?

`uniq` requires the input to already by sorted; one quick fix would be `... | sort | uniq -c`; the `.*` says to match on zero or more of any character (including whitespace and non-printing characters) ... try `'#[a-z]+'` to limit to just lower case letters — markp-fuso, Feb 10 '21 at 14:57
Please have a look at [What should I do when someone answers my question?](https://stackoverflow.com/help/someone-answers) — Socowi, Feb 16 '21 at 00:16

score 2 · Accepted Answer · answered Feb 10 '21 at 14:56

2

uniq -c works only on sorted input.
Also, you can drop the tr by changing the regex to #[a-z]*.

grep -Eo '#[a-z]*' ./1.txt | sort | uniq -c

prints

  2 #design
  4 #rss
  2 #site

as expected.

answered Feb 10 '21 at 14:56

Socowi

25,550
3
32
54

score 1 · Answer 2 · answered Feb 10 '21 at 15:02

It can be done in a single gnu awk:

awk -v RS='#[a-zA-Z]+' 'RT {++freq[RT]} END {for (i in freq) print freq[i], i}' file

2 #site
2 #design
4 #rss

Or else a grep + awk solution:

grep -iEo '#[a-z]+' file |
awk '{++freq[$1]} END {for (i in freq) print freq[i], i}'

2 #site
2 #design
4 #rss

score 0 · Answer 3 · answered Feb 10 '21 at 15:02

0

Using awk as an alternative:

awk -F [" "\;] '{ for(i=3;i<=NF;i++) {  map[$i]++ } } END { for (i in map) { print map[i]" "i} }' file

Set the field separator to a space or a ";" Then loop from the third field to the last field (NF), adding to an array map, with the field as the index and incrementing counter as the value. At the end of the file processing, loop through the map array and print the indexes/values.

answered Feb 10 '21 at 15:02

Raman Sailopal

12,320
2
11
18

`-F [" "\;]` should be `-F '[ ;]'`. Your array is keeping a count, not providing a mapping, so `cnt[]` or similar would be a more useful name for it than `map[]`. Also - `print map[i]" "i` = `print map[i], i` - let OFS have its reason to live :-). – Ed Morton Feb 11 '21 at 22:47

RavinderSingh13 · Answer 4 · 2021-02-10T15:27:17.183

With your shown samples only, could you please try following. Written and tested in GNU awk.

awk '
{
  while($0){
    match($0,/#[^ ]*/)
    count[substr($0,RSTART,RLENGTH)]++
    $0=substr($0,RSTART+RLENGTH)
  }
}
END{
  for(key in count){
    print count[key],key
  }
}' Input_file

Output will be as follows.

2 #site
2 #design
4 #rss

Explanation: Adding detailed explanation for above.

awk '                                     ##Starting awk program from here.
{
  while($0){                              ##Running while till line value.
    match($0,/#[^ ]*/)                    ##using match function to match regex #[^ ]* in current line.
    count[substr($0,RSTART,RLENGTH)]++    ##Creating count array which has index as matched sub string and keep increasing its value with 1 here.
    $0=substr($0,RSTART+RLENGTH)          ##Putting rest of line after match into currnet line here.
  }
}
END{                                      ##Starting END block of this program from here.
  for(key in count){                      ##using for loop to go throgh count here.
    print count[key],key                  ##printing value of count which has index as key and key here.
  }
}
' Input_file                              ##Mentioning Input_file name here.

Ed Morton · Answer 5 · 2021-02-11T22:55:08.610

0

$ cut -d';' -f3 file | tr ' ' '\n' | sort | uniq -c
      2 #design
      4 #rss
      2 #site

edited Feb 11 '21 at 22:55

answered Feb 11 '21 at 22:49

Ed Morton

188,023
17
78
185

Word count and it output

5 Answers5