1

Can someone help me get into the thinking of knowing how to fix data in SQL tables (by trying NOT to give me an SQL routines I could run).

Ok, this is the situation…. Suppose I have a single table with has a column called ColumnA which has lots of duplicate values. I need to remove all the duplicate entries from the table in question. Question is….if I had to write pseudo-code as a plan, what SQL should be written Many thanks to anyone who can offer me any pointers.

Kind Regards James

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
MasterSQL
  • 101
  • 7
  • Does have an identity column or primary key? – JBrooks Dec 08 '15 at 23:18
  • So you don't want an actual answer on how to perform this? You are wanting pseudo-code and then figure it out on your own? – SQLChao Dec 09 '15 at 00:32
  • Is your table ***relational*** - meaning it has no repeating groups and a designated *Primary Key* and is thus in 1NF? – Pieter Geerkens Dec 09 '15 at 04:34
  • Hi, yes I was hoping to just figure it out on my own. I've now heard of CTE which I'm going to try to learn next :) Rows could repeat more than once - All the duplicate rows have a unique primary key so the duplicate are in the other columns – MasterSQL Dec 09 '15 at 12:40

1 Answers1

0

I think you've already articulated a very basic psuedocode for the issue you describe in stating that you wish to delete duplicate values from column A.

For this example, I would tend to;

  1. Find all instances of duplicates

  2. Work out method of determining which one to keep (Google "MAX N in Group" for ideas) There are good articles here on SO and DBA Stackexchange also other external articles with examples

  3. Write your delete to cater for the records you identify as unwanted duplicates in the previous step

For me, when working through these types of issues in SQL Server, I tend to write a series of Common Table Expressions (CTE) to identify my target records and then delete based on that.

For example;

;WITH Duplicates AS (
  -- Write your select query to identify which subset of your records are affected by having duplicate values in Column A
), TargetRows AS (
  -- Further select from Duplicates some method of MAX N in Group to identify which of the rows are unwanted 
) -- Then here DELETE from your table based upon your findings from above
Community
  • 1
  • 1
Mr Moose
  • 5,946
  • 7
  • 34
  • 69
  • Hi Mr Moose I've found a solution using basic SQL now. Do you think I'd be better learning more about CTE instead :- SELECT SomeDate, NumValue, PatientID INTO TABLESTORE FROM TESTTABLE GROUP BY SomeDate, NumValue, PatientID HAVING COUNT(SomeDate)>1 AND COUNT(NumValue)>1 AND COUNT(PatientID)>1 -- could be 20 of these ORDER BY PatientID – MasterSQL Dec 09 '15 at 22:35
  • I wouldn't focus on learning CTEs for this issue specifically, but it can be useful. There are [many methods](https://www.mssqltips.com/sqlservertip/1918/different-strategies-for-removing-duplicate-records-in-sql-server/) for eliminating duplicates. The link I provided shows several but the problem with those examples is that it shows duplicates where ALL columns are the same. Given you mention in your latest comment that your duplicate rows have unique primary keys, you need to ensure that if these rows are referred to be any other tables, you need to make sure the data is re-associated. – Mr Moose Dec 10 '15 at 01:28
  • HI Mr Goose There a lot to study there from my view. At my level of knowledge its hard to know what will be useful for future and what will be a major benefit. Thanks again. – MasterSQL Dec 10 '15 at 21:14