The problem
I have a sql database with a table for Hashtags, of which many are duplicates with regard to their names.
A statement like
SELECT *
FROM HashTag
ORDER BY Name
returns something like
Id | Name
1947 | test
1950 | sample
1962 | test
1963 | sample
1986 | test
2014 | example
I want to keep only the hashtag with the lowest Id for each Name (1947 for 'test' and 1950 for 'sample') and update other tables with this Id, replacing the higher Ids (example: updating hashtag 'test'; lowest Id = 1947, higher Ids = 1962, 1986). These sql statements are updated manually as of now and would be as follows:
UPDATE HashTaggedActivity
SET [HashTag_id] = 1947
WHERE HashTag_id in (1962, 1986)
Update HashTaggedGroup
SET [HashTag_id] = 1947
WHERE HashTag_id in (1962, 1986)
DELETE ht
FROM HashTag ht
WHERE ht.Id in (1962, 1986)
After this I have to do this for HashTag 'sample', which is an error prone and tedious process. The HashTag 'example' is not a duplicate and should not result in updating other tables.
Is there any way to write an sql statement for doing this for each occurence of duplicate names in the table HashTag?
What I've tried so far
I think I have to combine a statement for getting a duplicate count ordered by Id
select ht.Id, ht.Name, htc.dupeCount
from HashTag ht
inner join (
SELECT ht.Name, COUNT(*) AS dupeCount
FROM HashTag ht
GROUP BY ht.Name
HAVING COUNT(*) > 1
) htc on ht.Name = htc.Name
ORDER BY Id
which gives
Id | Name | dupeCount
1947 | test | 3
1950 | sample | 2
1962 | test | 3
1963 | sample | 2
1986 | test | 3
2014 | example | 1
with my UPDATE and DELETE statements according to the dupeCount, but I'm not sure how to do this ;-)
Thanks in advance and best regards,
Michael