How remove equal rows having n columns in Linux

Question

I have a file (df.txt) with 3045 rows and 900.000 columns, but 145 repeated rows, thus:

 1234  1111122233330000000000003333311122222............................
    1235678 00000000000000000000000111111122222............................
    4567  1122222222222222222222223333333333333............................
    3456  111111111111111122222222222222222222............................
    1234 1111122233330000000000003333311122222............................
    1235678 00000000000000000000000111111122222............................
    3423 33333333300000000011111112222222222222............................
    2211 11111111111111111111111111111111111111............................

Thus, the new file (dffinal.txt)should not have repeating information in column 1, as:

 1234  1111122233330000000000003333311122222............................
    1235678 00000000000000000000000111111122222............................
    4567  1122222222222222222222223333333333333............................
    3456  111111111111111122222222222222222222............................
    3423 33333333300000000011111112222222222222............................
    2211 11111111111111111111111111111111111111............................

I try with

cat df.txt | sort |uniq > dffinal.txt

but it keeps the same number of rows

`uniq` compares the whole line, not just the first column. Try using `sort -u` and specifying the first column as the key. — Barmar, Aug 11 '20 at 22:06
Is that `900` or `900000` columns? (presume the `'.'` separator is just a LOCALE difference from `','`)? — David C. Rankin, Aug 11 '20 at 22:18

score 1 · Accepted Answer · answered Aug 11 '20 at 22:08

You can use awk to check for duplicates in column 1.

awk '!a[$1] { a[$1]++; print }' df.txt > dffinal.txt

This remembers the first column in the a array. If the column isn't already in there, it saves it and prints the line. So it prints the first instance of any repeated key.

How remove equal rows having n columns in Linux

1 Answers1