3

If we have the following result:

Operating System,50
Operating System,40
Operating System,30
Operating System,23
Data Structure,87
Data Structure,21
Data Structure,17
Data Structure,8
Data Structure,3
Crypo,33
Crypo,31
C++,65
C Language,39
C Language,19
C Language,4
Java 1.6,16
Java 1.6,11
Java 1.6,10
Java 1.6,2

I only want to compare the first field (book name), and remove duplicate lines except the first line of each book, which records the largest number. So the result is as below:

Operating System,50
Data Structure,87
Crypo,33
C++, 65
C Language,39
Java 1.6,16

Can anyone help me out that how could I do using uniq, sort & cut command? May be using tr, head or tail?

eleven
  • 359
  • 1
  • 5
  • 14

4 Answers4

2

Most elegant in this case would seem

rev input | uniq -f1 | rev
sehe
  • 374,641
  • 47
  • 450
  • 633
0

This could be done in different ways, but I've tried to restrict myself to the tools you suggested:

cut -d, -f1 file | uniq | xargs -I{} grep -m 1 "{}" file

Alternatively, if you are sure that the words in the first column do not have more than 2 characters which are the same, you can simply use: uniq -w3 file. This tells uniq to compare no more than the first three characters.

dogbane
  • 266,786
  • 75
  • 396
  • 414
  • no, it's not possible without `xargs`. You need some way to call `grep` repeatedly. Another option would be to use a loop. – dogbane Oct 01 '12 at 16:58
0
awk -F, '{if(P!=$1)print;p=$1}' your_file
Vijay
  • 65,327
  • 90
  • 227
  • 319
0

If your input is sorted, you can use GNU awk like this:

awk -F, '!array[$1]++' file.txt

Results:

Operating System,50
Data Structure,87
Crypo,33
C++,65
C Language,39
Java 1.6,16

If your input is unsorted, you can use GNU awk like this:

awk -F, 'FNR==NR { if ($2 > array[$1]) array[$1]=$2; next } !dup[$1]++ { if ($1 in array) print $1 FS array[$1] }' file.txt{,}

Results:

Operating System,50
Data Structure,87
Crypo,33
C++,65
C Language,39
Java 1.6,16
Steve
  • 51,466
  • 13
  • 89
  • 103