Unix awk - count of occurrences for each unique value

Question

In Unix, I am printing the unique value for the first character in a field. I am also printing a count of the unique field lengths. Now I would like to do both together. Easy to do in SQL, but I'm not sure how to do this in Unix with awk (or grep, sed, ...).

PRINT FIRST UNIQ LEADING CHAR

awk -F'|' '{print substr($61,1,1)}' file_name.sqf | sort | uniq

PRINT COUNT OF FIELDS WITH LENGTHS 8, 10, 15

awk -F'|' 'NR>1 {count[length($61)]++}  END {print count[8] ", " count[10] ", " count[15]}' file_name.sqf | sort | uniq

DESIRED OUTPUT

first char, length 8, length 10, length 15
a, 10, , 150
b, 50, 43, 31
A, 20, , 44
B, 60, 83, 22

The fields that start with an upper or lower 'a' are never length 10.

The input file is a | delimited .sqf with no header. The field is varChar 15.

sample input

56789 | someValue | aValue | otherValue | 712345
46789 | someValue | bValue | otherValue | 812345
36789 | someValue | AValue | otherValue | 912345
26789 | someValue | BValue | otherValue | 012345
56722 | someValue | aValue | otherValue | 712345
46722 | someValue | bValue | otherValue | 812345

desired output

a: , , 2
b: 1, , 1
A: , , 1
B: , 1,

'a' has two instances that are length 15 'b' has one instance each of length 8 and 15 'A' has one instance that is length 15 'B' has one instance that is length 10

Thank you.

Please add sample input (file_name.sqf) for your desired output to your question. — Cyrus, Oct 22 '18 at 16:47
Where do you get "a .. length 15 'b' .. length 8" from the posted sample input? Every field in given column has the same length? — karakfa, Oct 22 '18 at 17:52

score 0 · Accepted Answer · answered Oct 22 '18 at 18:00

0

I think you need a better sample input file, but I guess that's what you're looking for

$ awk -F' \\| ' -v OFS=, '{k=substr($3,1,1); ks[k]; c[k,length($3)]++}
                      END {for(k in ks) print k": "c[k,6],c[k,10],c[k,15]}' file

A: 1,,
B: 1,,
a: 2,,
b: 2,,

note that since all lengths are 6, I printed that count instead of 8. With the right data you should be able to get the output you expect. Note however that the order is not preserved.

answered Oct 22 '18 at 18:00

karakfa

66,216
7
41
56

TOTALLY COOL! ............. I made minor adjustments to your code and this works ................ $ awk -F'|' -v OFS=, '{$3!=""} {k=substr($3,1,1); ks[k]; c[length($3)]++} END {for(k in ks) print k": "c[0],c[10],c[15]}' – user3439308 Oct 22 '18 at 18:27
ACTUALLY ........... now that I look closer ....... ALMOST ...... the problem is that my version of your code does not give a count for each leading character a, b, A, B. instead it sums all and gives the same count four times, but at least the column count is separated out. – user3439308 Oct 22 '18 at 18:34
1

SUGGESTION: post a realistic input file, test my script with the posted file. Nobody knows what you're testing, but based on my output it counts the lengths by first char. No one else knows what you're testing. – karakfa Oct 22 '18 at 18:41
DONE - YOUR CODE WAS ACTUALLY VERY CLOSE awk -F'|' -v OFS=, '{$3!=""} {k=substr($3,1,1); ks[k]; c[k,length($3)]++} END {for(k in ks) print k": "c[k,0],c[k,10],c[k,15]}' FILE_NAME.sqf – user3439308 Oct 22 '18 at 18:41
how can zero length appear for any given char? – karakfa Oct 22 '18 at 18:42
That counts the number of null value. Works perfectly, thank you so much. – user3439308 Oct 22 '18 at 20:05

Unix awk - count of occurrences for each unique value

1 Answers1