How to get the group by count in unix

Question

I have a list of records as following

Item1,200
Item1,200
Item3,900
Item2,500
Item2,800
Item1,600
Item4,
Item5,
Item4,100
Item5,
Item5,444

My output should be

"Please check the file as Item1 is greater than 2"

With my awk command the output is (Counting the blanks), But it should not

Item1 3
Item2 2
Item3 1
Item4 2
Item5 3

Unix command should count the items without blanks the above list and should return a statement 'please check the records' if the count of any item greater than 2 (without blanks).

I have tried with below awk command , but am unable to filter the blanks and get count greater than 2 records.

awk -F, '{a[$1]++;}END{for (i in a)print i, a[i];}' filetest.txt

What is your expected output for given input, state that clearly in question — Inian, Aug 14 '17 at 09:06

score 2 · Answer 1 · answered Aug 14 '17 at 09:03

2

You can use $2 in front of the commands to check that the second field exists. Similarly, use $3 in front of a list of commands to see if a third field exists and then write an error message.

awk -F, '$3 {print "Please check the records: $_"}; $2 {a[$1]++;}END{for (i in a)print i, a[i];}' filetest.txt

answered Aug 14 '17 at 09:03

Johannes Riecken

2,301
16
17

Thanks!! I should check whether the any of the item count is > 2 (without blanks) if yes then I should print a statement that file is incorrect else file is correct – Bobby Aug 14 '17 at 09:16
1

You mean like this? `awk -F, '$2 {a[$1]++;}END{for (i in a){print i, a[i]}for (i in a){if(a[i]>2){print "Please check the records: ", i, a[i]}}}' filetest.txt` – Johannes Riecken Aug 14 '17 at 09:26
Yeah Thanks!!.. I have one more question similar to this but some what enhanced, Source: a,yes a,yes b,No c,N/A c,N/A c,N/A Here Yes,No are acceptable If different word have highest count then send a statement as "Please Check" – Bobby Aug 14 '17 at 09:58
As the poster of the other answer said, you will have to write another question and show some effort trying to solve the problem yourself. awk has very nice documentation with lots of examples. – Johannes Riecken Aug 15 '17 at 06:14

RavinderSingh13 · Accepted Answer · 2017-08-14T12:54:11.660

2

try following too once, considering you need output in the sorted form.

awk -F, '$2{array[$1]++} END{for(k in array){print k,array[k] | "sort -k1"}}'  Input_file

Output will be as follows.

Item1 3
Item2 2
Item3 1
Item4 1
Item5 1

EDIT: As OP has asked to provide count of first and second both the fields, could you please try following.

awk -F, '$2{array[$1]++;array2[$1" "$2]++;array3[$2]++} END{for(u in array){for(y in array3){if(array2[u" "y]){print u,array[u],y,array2[u" "y]}}}}'

OR

awk -F, '$2{
  array[$1]++;
  array2[$1" "$2]++;
  array3[$2]++
  }
END{
  for(u in array){
    for(y in array3){
      if(array2[u" "y]){
        print u,array[u],y,array2[u" "y]
      }
    }
  }
}'  Input_file

Output will be as follows.

Item1 3 200 2
Item1 3 600 1
Item2 2 500 1
Item2 2 800 1
Item3 1 900 1
Item4 1 100 1
Item5 1 444 1

EDI2: Adding one more solution as per OP's ask now too. Which will OMMIT any duplicate count of $2 with respect to each $1 value.

awk -F, '$2 && !array2[$1,$2]++{array[$1]++} END{for(k in array){print k,array[k] | "sort -k1"}}'   Input_file

Output will be as follows.

Item1 2
Item2 2
Item3 1
Item4 1
Item5 1

edited Aug 14 '17 at 12:54

answered Aug 14 '17 at 09:49

RavinderSingh13

130,504
14
57
93

Okay!!.. if we group by on 2 columns like {a[$1,$2]++;} then am getting final output as "Item1200" ,Can get as separate strings as "Item1,200" ? – Bobby Aug 14 '17 at 10:57
change print above to print k","array[k] and let me know if this helps you., – RavinderSingh13 Aug 14 '17 at 11:11
Nope!! "Item1200,2" this it s output I have got. – Bobby Aug 14 '17 at 11:35
Please update your question with Input_file what you are using and let me know then, my code was given to shown Input_file sample only. – RavinderSingh13 Aug 14 '17 at 11:41
sorry My bad!! I want output like distinct count of $1 and $2 – Bobby Aug 14 '17 at 11:50
not an issue, could you please check my edit if this is what you need. Let me know on same then. – RavinderSingh13 Aug 14 '17 at 12:33
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/151884/discussion-between-bobby-and-ravindersingh13). – Bobby Aug 14 '17 at 12:40
Thanks for your effort, But I need a small change in that My output should be like Item1 2 Item2 2 Item3 1 Item4 1 Item5 1 Item 1 has a same value in second column I need to take only 1 from that – Bobby Aug 14 '17 at 12:44
As requested before, could you please EDIT your post with sample output and logic by which you need, comments are not meant for this. – RavinderSingh13 Aug 14 '17 at 12:46
Am using the same, If you see the input file for Item1 there are total 3 rows but first two are having same data as item1 200 , Now I want to omit common data and take only distinct as Item1 as 2 unique data – Bobby Aug 14 '17 at 12:50
Excellent Bro! Thanks !! Working fine!! – Bobby Aug 14 '17 at 13:00
Please check this once !! [https://stackoverflow.com/questions/45675372/how-to-get-the-validate-the-count-with-the-group-by-data-in-unix] – Bobby Aug 14 '17 at 13:14

score 0 · Answer 3 · answered Aug 14 '17 at 09:28

0

Put a condition in that checks for blank fields ($2 !- "")

awk -F, '$2 != "" {a[$1]++;} END {for (i in a) { if (a[i] > 2) { print "Check the records for"i } } }' filetest.txt

Result:

Check the records for Item1

answered Aug 14 '17 at 09:28

Raman Sailopal

12,320
2
11
18

Thanks!!.. I have one more question similar to this but some what enhanced, Source: a,yes a,yes b,No c,N/A c,N/A c,N/A Here Yes,No are acceptable If different word have highest count then send a statement as "Please Check" – Bobby Aug 14 '17 at 10:19
Post another question making it clear as to exactly what you need. – Raman Sailopal Aug 14 '17 at 10:21

How to get the group by count in unix

3 Answers3