0

I have several columns including: device_id, time and type, now I want to delete repeated data.

Here is my data example:

+-----------+------------+------+
| device_id |    time    | type |
+-----------+------------+------+
| dev 1     | 2020/06/17 | A    |
| dev 1     | 2020/06/17 | B    |
| dev 2     | 2020/06/18 | B    |
| dev 2     | 2020/06/18 | B    |
| dev 3     | 2020/06/17 | A    |
| dev 3     | 2020/06/17 | A    |
+-----------+------------+------+

I want the output following the rules:

  1. there are A and B type in dev 1 in 2020/6/17, keep A
  2. there are only B type in dev 2 in 2020/6/18, keep B
  3. there are only A type in dev 3 in 2020/6/17, keep A

And after deleting, my expected output is:

 device_id |  time      | type
-----------+------------+------
 dev 1     | 2020/06/17 | A
 dev 2     | 2020/06/18 | B
 dev 3     | 2020/06/17 | A

Is there any delete statement can do the job? Any suggestion is welcome!

if there is any part of my question not much clear,

please let me know, thank you.

Additional details:

  1. This question is similar to me, but I need the delete statement and postgresql version
  2. There are about 90 million data in my db (IOT data), so if the SQL statement is high efficiency would be welcome
Roy Kuo
  • 43
  • 1
  • 5

1 Answers1

0

After discussing with my team leader and take a reference from chapter:Deleting duplicate rows using subquery

Finally I get my resolution as follows:

DELETE FROM My_table
WHERE id IN 
    (SELECT t.id
     FROM
        (SELECT id,
         ROW_NUMBER() OVER( PARTITION BY device_id,
             time
        ORDER BY type) AS row_num
        FROM My_table) t
    WHERE t.row_num > 1);

I think the key is to find the unique object (ex: id) from repeated rows

and use it as searching criteria.

Because there are many data in my DB, I set the index keys including device_id, time and type.

If anyone has better answer, please share it with me, thanks!

Roy Kuo
  • 43
  • 1
  • 5