Questions tagged [uniq]

uniq is a Unix/POSIX/Linux utility to remove or filter duplicate lines from a sorted file. It is also the name of a method to remove duplicates from an array in Ruby.

uniq is a Unix/POSIX/Linux utility to remove or filter duplicate lines from a sorted file. It is typically applied to the output of sort.

In Ruby , uniq is a method of the Array class to remove duplicates from an array. uniq creates a new array whereas uniq! modifies the array in place.

For questions about unique identifiers, keys, names, etc., see or more specific tags such as , , , , etc.

Documentation

454 questions
6
votes
3 answers

how to aggregate counts in a bash one-liner

I often use sort | uniq -c to make count statistics. Now, if I have two files with such count statistics, I would like to put them together and add the counts. (I know I could append the original files and count there, but lets assume only the count…
benroth
  • 2,468
  • 3
  • 24
  • 25
6
votes
1 answer

"Illegal Byte sequence" error while using shell commands in mac bash terminal

Getting "illegal byte sequence" error while trying to extract non English characters from a large file in MacOS bash shell. This is the script that I am trying to use: sed 's/[][a-z,0-9,A-Z,!@#\$%^&*(){}":/_-|. -][\;''=?]*//g' < $1…
Abhineet Prasad
  • 1,271
  • 2
  • 11
  • 14
6
votes
4 answers

Merge results from uniq -c

I have many files with results of command: uniq -c some_file > some_file.out For example: 1.out: 1 a 2 b 4 c 2.out 2 b 8 c I would like to merge these results, so I get: 1 a 4 b 12 c I thought that sort or uniq could handle it but I…
radarek
  • 2,478
  • 2
  • 17
  • 12
5
votes
1 answer

Even after `sort`, `uniq` is still repeating some values

Reference file: http://snap.stanford.edu/data/wiki-Vote.txt.gz (It is a tape archive that contains a file called Wiki-Vote.txt) The first few lines in the file that contains the following, head -n 10 Wiki-Vote.txt # Directed graph (each unordered…
SigSegV
  • 73
  • 4
5
votes
1 answer

Use case for uniq, groupby without sorting

While debugging a Python programme, I recently discovered that the Python itertools#groupby() function requires the input collection to be sorted, because it only groups identical elements that occur in a sequence: Generally, the iterable needs to…
Carsten
  • 1,912
  • 1
  • 28
  • 55
5
votes
3 answers

How to find duplicate lines in a file?

I have an input file with foillowing data: line1 line2 line3 begin line5 line6 line7 end line9 line1 line3 I am trying to find all the duplicate lines , I tried sort filename | uniq -c but does not seem to be working for me : It gives me : 1…
Vicky
  • 1,298
  • 1
  • 16
  • 33
5
votes
3 answers

Finding a uniq -c substitute for big files

I have a large file (50 GB) and I could like to count the number of occurrences of different lines in it. Normally I'd use sort bigfile | uniq -c but the file is large enough that sorting takes a prohibitive amount of time and memory. I could…
Charles
  • 11,269
  • 13
  • 67
  • 105
5
votes
3 answers

Sort and keep a unique duplicate which has the highest value

I have a file like the one shown below, I want to keep the combinations between the first and second field which has the highest value on the third field(the ones with the arrows, arrows are not included in the actual file) . 1 1 10 1 1 12 …
Tamalero
  • 471
  • 1
  • 7
  • 14
5
votes
5 answers

What is the difference between 'sort -u' and 'uniq'?

I need script that sorts a text file and remove the duplicates. Most, if not all, of the examples out there use the sort file1 | uniq > file2 approach. In the man sort though, there is an -u option that does this at the time of sorting. Is there a…
Stoinov
  • 774
  • 12
  • 25
5
votes
2 answers

bash add up columns with same first column

I have a file that has a name in the first column and count in the second column. It is sorted by name. dan 3355 dan 667 dan 889 frank 8 frank 99 frank 90 ian 9 I would like to combine all the same names and output the…
user1190650
  • 3,207
  • 6
  • 27
  • 34
5
votes
3 answers

Bash output the line with highest value

my question is pretty much like this one but with one difference; i want the output the line that has highest score on the 3rd tab. my data is like: 1.gui Qxx 16 2.gui Qxy 23 3.guT QWS 11 and i want to get this: 1.gui Qxy 23 3.guT QWS …
teutara
  • 605
  • 4
  • 12
  • 24
5
votes
1 answer

Calculate Word occurrences from file in bash

I'm sorry for the very noob question, but I'm kind of new to bash programming (started a few days ago). Basically what I want to do is keep one file with all the word occurrences of another file I know I can do this: sort | uniq -c | sort the thing…
Epi
  • 682
  • 3
  • 10
  • 16
4
votes
5 answers

Removing lines containing a unique first field with awk?

Looking to print only lines that have a duplicate first field. e.g. from data that looks like this: 1 abcd 1 efgh 2 ijkl 3 mnop 4 qrst 4 uvwx Should print out: 1 abcd 1 efgh 4 qrst 4 uvwx (FYI - first field is not always 1 character long in my…
Kyle
  • 269
  • 1
  • 2
  • 8
4
votes
1 answer

How get unique lines from a very large file in linux?

I have a very large data file (255G; 3,192,563,934 lines). Unfortunately I only have 204G of free space on the device (and no other devices I can use). I did a random sample and found that in a given, say, 100K lines, there are about 10K unique…
Sir Robert
  • 4,686
  • 7
  • 41
  • 57
4
votes
5 answers

How to completely erase the duplicated lines by linux tools?

This question is not equal to How to print only the unique lines in BASH? because that ones suggests to remove all copies of the duplicated lines, while this one is about eliminating their duplicates only, i..e, change 1, 2, 3, 3 into 1, 2, 3…
Evandro Coan
  • 8,560
  • 11
  • 83
  • 144
1 2
3
30 31