0

I want to select and then delete a list of entries in my tables that have case-insensitive duplications.

In other words, there are these rows that are unique... ..but they're not unique if you ignore case factor in case. They got in while I wasn't watching.

So how can I select against the column to find the ids that I should delete? (I'm fine with deleting both duplications).

simple sample column structure:

player_id | uname
------------------
34        | BOB
544       | bob
etc...
OMG Ponies
  • 325,700
  • 82
  • 523
  • 502
Kzqai
  • 22,588
  • 25
  • 105
  • 137
  • To be more specific, you don't want to "delete a list of entries" but rather, you want to keep exactly one of the entries, right? – Andy Lester Oct 14 '10 at 23:52
  • Nah, I was okay with deleting none of the duplicates, just for ease of data-integrity, but keeping only one works fine as well. – Kzqai Oct 15 '10 at 00:39
  • ***meant was okay with deleting all of the duplicates, but yeah, whatever. Moot point now. *smiles* – Kzqai Oct 15 '10 at 00:53
  • Once you've cleaned up your data, I would recommend adding a "CREATE UNIQUE INDEX index_xyz ON table_abc (upper(uname)). This will prevent it from happening again plus it will give you an effective case-insensitive index if you use WHERE upper(uname) in a query. – Matthew Wood Oct 15 '10 at 15:34

1 Answers1

2

Players to keep (assuming they registered first)

SELECT min(player_id) as player_id
FROM players
GROUP BY lower(uname)

Use it to dislay the users to remove and their corresponding keeper.

SELECT 
    players.player_id delete_id,
    players.uname delete_uname,
    keepers.uname keeper_uname,
    keepers.player_id keeper_id    
FROM players JOIN 
    (
        SELECT p.player_id, p.uname
        FROM players p JOIN
        (
            SELECT min(player_id) player_id
              FROM players
          GROUP BY lower(uname)
        ) as keeper_ids
        ON (p.player_id = keeper_ids.player_id)     
    ) as keepers
    ON (lower(players.uname) = lower(keepers.uname) AND players.player_id <> keepers.player_id)
ORDER BY keepers.player_id, players.player_id 

Output:

delete_id | delete_uname | keeper_uname | keeper_id
---------------------------------------------------
544       | bob          | BOB          | 34
kevpie
  • 25,206
  • 2
  • 24
  • 28
  • FYI: This SQL hasn't been tested. – kevpie Oct 15 '10 at 00:07
  • Hmmm, that output works great... ...just not sure how to turn it into a deletion. – Kzqai Oct 15 '10 at 00:43
  • using the maximum player_id, I just used this a few times: delete from players where player_id in (SELECT max(player_id) as player_id FROM players GROUP BY lower(uname) having count(uname)>1); Thanks for pointing it out, I should have seen it before. – Kzqai Oct 15 '10 at 00:47
  • Great! Keep in mind you may have more than 2 in mixed case. Bob BOB bob, this may affect the max logic. – kevpie Oct 15 '10 at 00:51