How to delete duplicated rows based in a column value?

Question

Given the following table

 123456.451 entered-auto_attendant
 123456.451 duration:76 real:76
 139651.526 entered-auto_attendant
 139651.526 duration:62 real:62`
 139382.537 entered-auto_attendant

Using a bash shell script based in Linux, I'd like to delete all the rows based on the value of column 1 (The one with the long number). Having into consideration that this number is a variable number

I've tried with

awk '{a[$3]++}!(a[$3]-1)' file

sort -u | uniq

But I am not getting the result which would be something like this, making a comparison between all the values of the first column, delete all the duplicates and show it

 123456.451 entered-auto_attendant
 139651.526 entered-auto_attendant
 139382.537 entered-auto_attendant

Kent · Accepted Answer · 2014-04-04T06:59:22.960

8

you didn't give an expected output, does this work for you?

 awk '!a[$1]++' file

with your data, the output is:

123456.451 entered-auto_attendant
139651.526 entered-auto_attendant
139382.537 entered-auto_attendant

and this line prints only unique column1 line:

 awk '{a[$1]++;b[$1]=$0}END{for(x in a)if(a[x]==1)print b[x]}' file

output:

139382.537 entered-auto_attendant

edited Apr 04 '14 at 06:59

answered Apr 03 '14 at 22:58

Kent

189,393
32
233
301

+1: Minor typo - `b[$1]` should be `b[x]` in the `END` block. – jaypal singh Apr 03 '14 at 23:54
And what if I need all the columns that start with 139382.537 (in this case) – user3494949 Apr 04 '14 at 17:00

score 6 · Answer 2 · answered Apr 03 '14 at 22:03

6

uniq, by default, compares the entire line. Since your lines are not identical, they are not removed.

You can use sort to conveniently sort by the first field and also delete duplicates of it:

sort -t ' ' -k 1,1 -u file

-t ' ' fields are separated by spaces
-k 1,1: only look at the first field
-u: delete duplicates

Additionally, you might have seen the awk '!a[$0]++' trick for deduplicating lines. You can make this dedupe on the first column only using awk '!a[$1]++'.

answered Apr 03 '14 at 22:03

that other guy

116,971
11
170
194

Upvoting this answer as I think its a bit more flexible. You could dedupe across multiple fields for example. Thats harder to do with awk. – catch22 Jul 11 '23 at 02:58

score 1 · Answer 3 · edited Apr 03 '14 at 22:50

1

Using awk:

awk '!($1 in a){a[$1]++; next} $1 in a' file
123456.451 duration:76 real:76
139651.526 duration:62 real:62

edited Apr 03 '14 at 22:50

jaypal singh

74,723
23
102
147

answered Apr 03 '14 at 22:02

anubhava

761,203
64
569
643

Good, but I'd like to have all the records that start with the same column, like in the description, in that case are 2 records with the same first column, but sometimes may be three or more – user3494949 Apr 03 '14 at 22:33
Isn't that what this answer is already doing. It is printing all the duplicate lines. What is your expected output? – anubhava Apr 03 '14 at 22:38

score 1 · Answer 4 · edited Jul 22 '16 at 09:13

1

try this command

awk '!x[$1]++ { print $1, $2 }' file

edited Jul 22 '16 at 09:13

J. Chomel

8,193
15
41
69

answered Jul 22 '16 at 08:34

Yogesh Deore

11
1

How to delete duplicated rows based in a column value?

4 Answers4

Linked

Related