3

How can I remove lines appear only once in a file in bash?

For example, file foo.txt has:

1
2
3
3
4
5

after process the file, only

3
3

will remain.

Note the file is sorted already.

mklement0
  • 382,024
  • 64
  • 607
  • 775
user200340
  • 3,301
  • 13
  • 52
  • 74

5 Answers5

6

If your duplicated lines are consecutives, you can use uniq

uniq -D file

from the man pages:

-D print all duplicate lines

oliv
  • 12,690
  • 25
  • 45
3

Just loop the file twice:

$ awk 'FNR==NR {seen[$0]++; next} seen[$0]>1' file file
3
3
  • firstly to count how many times a line occurs: seen[ record ] keeps track of it as an array.
  • secondly to print those that appear more than once
fedorqui
  • 275,237
  • 103
  • 548
  • 598
2

Using single pass awk:

awk '{freq[$0]++} END{for(i in freq) for (j=1; freq[i]>1 && j<=freq[i]; j++) print i}' file

3
3
  • Using freq[$0]++ we count and store frequency of each line.
  • In the END block if frequency is greater than 1 then we print those lines as many times as the frequency.
anubhava
  • 761,203
  • 64
  • 569
  • 643
2

Using awk, single pass:

$ awk 'a[$0]++ && a[$0]==2 {print} a[$0]>1' foo.txt
3
3

If the file is unordered, the output will happen in the order duplicates are found in the file due to the solution not buffering values.

James Brown
  • 36,089
  • 7
  • 43
  • 59
1

Here's a POSIX-compliant awk alternative to the GNU-specific uniq -D:

awk '++seen[$0] == 2; seen[$0] >= 2' file

This turned out to be just a shorter reformulation of James Brown's helpful answer.

Unlike uniq, this command doesn't strictly require the duplicates to be grouped, but the output order will only be predictable if they are.

That is, if the duplicates aren't grouped, the output order is determined by the the relative ordering of the 2nd instances in each set of duplicates, and in each set the 1st and the 2nd instances will be printed together.

For unsorted (ungrouped) data (and if preserving the input order is also important), consider:

Community
  • 1
  • 1
mklement0
  • 382,024
  • 64
  • 607
  • 775