Unix uniq, sort & cut command remove duplicate lines

Question

If we have the following result:

Operating System,50
Operating System,40
Operating System,30
Operating System,23
Data Structure,87
Data Structure,21
Data Structure,17
Data Structure,8
Data Structure,3
Crypo,33
Crypo,31
C++,65
C Language,39
C Language,19
C Language,4
Java 1.6,16
Java 1.6,11
Java 1.6,10
Java 1.6,2

I only want to compare the first field (book name), and remove duplicate lines except the first line of each book, which records the largest number. So the result is as below:

Operating System,50
Data Structure,87
Crypo,33
C++, 65
C Language,39
Java 1.6,16

Can anyone help me out that how could I do using uniq, sort & cut command? May be using tr, head or tail?

like sort -u -t, -k2nr, but it doesn't work – eleven Oct 01 '12 at 16:55 — eleven, Oct 01 '12 at 16:55

score 2 · Answer 1 · answered Oct 03 '12 at 00:41

2

Most elegant in this case would seem

rev input | uniq -f1 | rev

answered Oct 03 '12 at 00:41

sehe

374,641
47
450
633

dogbane · Answer 2 · 2012-10-01T17:03:23.817

0

This could be done in different ways, but I've tried to restrict myself to the tools you suggested:

cut -d, -f1 file | uniq | xargs -I{} grep -m 1 "{}" file

Alternatively, if you are sure that the words in the first column do not have more than 2 characters which are the same, you can simply use: uniq -w3 file. This tells uniq to compare no more than the first three characters.

edited Oct 01 '12 at 17:03

answered Oct 01 '12 at 16:47

dogbane

266,786
75
396
414

no, it's not possible without `xargs`. You need some way to call `grep` repeatedly. Another option would be to use a loop. – dogbane Oct 01 '12 at 16:58

score 0 · Answer 3 · answered Oct 01 '12 at 16:51

0

awk -F, '{if(P!=$1)print;p=$1}' your_file

answered Oct 01 '12 at 16:51

Vijay

65,327
90
227
319

score 0 · Accepted Answer · answered Oct 03 '12 at 00:35

If your input is sorted, you can use GNU awk like this:

awk -F, '!array[$1]++' file.txt

Results:

Operating System,50
Data Structure,87
Crypo,33
C++,65
C Language,39
Java 1.6,16

If your input is unsorted, you can use GNU awk like this:

awk -F, 'FNR==NR { if ($2 > array[$1]) array[$1]=$2; next } !dup[$1]++ { if ($1 in array) print $1 FS array[$1] }' file.txt{,}

Results:

Operating System,50
Data Structure,87
Crypo,33
C++,65
C Language,39
Java 1.6,16

Unix uniq, sort & cut command remove duplicate lines

4 Answers4