What is the best way of doing this? It's a 250GB Text file 1 word per line
Input:
123
123
123
456
456
874
875
875
8923
8932
8923
Output wanted:
123
456
874
875
8923
8932
I need to get 1 copy of each duplicated line I DON'T WANT if there are 2 of the SAME LINES, REMOVE BOTH, just remove 1, always keeping 1 unique line.
What I do now:
$ cat final.txt | sort | uniq > finalnoduplicates.txt
In a screen, this is working? I don't know, because when I check the size of output file, and it's 0:
123user@instance-1:~$ ls -l
total 243898460
-rw-rw-r-- 1 123user 249751990933 Sep 3 13:59 final.txt
-rw-rw-r-- 1 123user 0 Sep 3 14:26 finalnoduplicates.txt
123user@instance-1:~$
But when I check htop
cpu value of the screen running this command is at 100%.
Am I doing something wrong?