How to obtain only repeated lines for a specific column in bash

Question

Imagine I have this file in bash:

1 3 6 name1
1 2 7 name2
3 4 2 name1
2 2 2 name3
7 8 2 name2
1 2 9 name4

How could I extract just those lines which present the field "name" repeated and sort them?

My expected output would be:

1 3 6 name1
3 4 2 name1
1 2 7 name2
7 8 2 name2

I was trying to use sort -k4,4 myfile | uniq -D, but I don't find how to tell uniq to work with the 4th column. Thanks!

Do the repetitions always occur only twice? – Cyrus Apr 17 '20 at 14:15 — Cyrus, Apr 17 '20 at 14:15
No, the repetitions can occur different times – Jeni Apr 17 '20 at 14:20 — Jeni, Apr 17 '20 at 14:20

oguz ismail · Accepted Answer · 2020-04-17T15:39:21.127

3

You were close. You need to skip fields preceding the last one.

$ sort -k4 file | uniq -f3 -D
1 3 6 name1
3 4 2 name1
1 2 7 name2
7 8 2 name2

edited Apr 17 '20 at 15:39

answered Apr 17 '20 at 14:26

oguz ismail

1

Nice. My attempt was a little more confusing. With GNU grep: `sort -k 4,4 file | grep -Poz '([0-9]+ ){3}([^ ]+)\n(([0-9]+ ){3}\2\n)+'` – Cyrus Apr 17 '20 at 14:28

RavinderSingh13 · Answer 2 · 2020-04-17T14:29:29.773

2

Could you please try following.

awk '
{
  a[$NF]++
  b[$NF]=(b[$NF]?b[$NF] ORS:"")$0
}
END{
  for(i in a){
    if(a[i]>1){
      print b[i]
    }
  }
}
'  Input_file

OR in case you want to sort the output try following then.

awk '
{
  a[$NF]++
  b[$NF]=(b[$NF]?b[$NF] ORS:"")$0
}
END{
  for(i in a){
    if(a[i]>1){
      print b[i]
    }
  }
}
'  Input_file  |  sort -k4

edited Apr 17 '20 at 14:29

answered Apr 17 '20 at 14:22

RavinderSingh13

score 1 · Answer 3 · answered Apr 17 '20 at 14:23

1

You may use this awk + sort:

awk 'FNR==NR{freq[$NF]++; next} freq[$NF] > 1' file{,} | sort -k4

1 3 6 name1
3 4 2 name1
1 2 7 name2
7 8 2 name2

answered Apr 17 '20 at 14:23

anubhava

3 Answers3