I have a file (df.txt) with 3045 rows and 900.000 columns, but 145 repeated rows, thus:
1234 1111122233330000000000003333311122222............................
1235678 00000000000000000000000111111122222............................
4567 1122222222222222222222223333333333333............................
3456 111111111111111122222222222222222222............................
1234 1111122233330000000000003333311122222............................
1235678 00000000000000000000000111111122222............................
3423 33333333300000000011111112222222222222............................
2211 11111111111111111111111111111111111111............................
Thus, the new file (dffinal.txt)should not have repeating information in column 1, as:
1234 1111122233330000000000003333311122222............................
1235678 00000000000000000000000111111122222............................
4567 1122222222222222222222223333333333333............................
3456 111111111111111122222222222222222222............................
3423 33333333300000000011111112222222222222............................
2211 11111111111111111111111111111111111111............................
I try with
cat df.txt | sort |uniq > dffinal.txt
but it keeps the same number of rows