17

I need to delete from row number 475 to 948 due to them being duplicates of rows 1-474. It would be something close to this, I presume, or is there more to it?

DELETE FROM dbo.industry WHERE row_number between 475 and 948
Tim Wilcox
  • 1,275
  • 2
  • 19
  • 43
  • Do you have a primary key, something like an ID in the database? If you do you can just delete by ID easy enough. – Michael Platt Oct 06 '17 at 18:53
  • 1
    Is `row_number` a column in your table? Are the values 475-948 in that column the rows you want to delete? – David Oct 06 '17 at 18:53
  • @MichaelPlatt, I have a compound primary key made of 9 fields. – Tim Wilcox Oct 06 '17 at 18:54
  • @David, row_number does not exist yet. Is it easy enough to create that? – Tim Wilcox Oct 06 '17 at 18:55
  • So this example from Microsoft docs shows how to select based on row number. You could tweak it to delete if you wanted. `SELECT ROW_NUMBER() OVER(PARTITION BY recovery_model_desc ORDER BY name ASC) AS Row#, name, recovery_model_desc FROM sys.databases WHERE database_id < 5;` – Michael Platt Oct 06 '17 at 18:56
  • 1
    @TimWilcox: Well, adding a column is easy, but is that really what you want? How do you define “row number”? How do you identify the records to delete? – David Oct 06 '17 at 18:57
  • @david, not sure about how I would define it. I would like to think that it could be row number of 1 for the first record and so on from there. As I am looking at the records, rows 475 to 948 are exact duplicates of rows 1-474. – Tim Wilcox Oct 06 '17 at 19:02
  • 1
    @TimWilcox: Well, an important thing to keep in mind is that what you're calling the "row number" depends *entirely* on how you *sort* the data. SQL itself doesn't really guarantee that. It's generally sorted by a primary key, but doesn't necessarily need to be. Records themselves don't have "row numbers". – David Oct 06 '17 at 19:04
  • @David, not sure how it is sorted. I just imported into SQL from a fixed width file. – Tim Wilcox Oct 06 '17 at 19:06
  • 1
    @TimWilcox: I think some of the answers below can help, but first you have to be able to identify what is to be deleted. The rows aren't sorted by any default, nor are they *technically* duplicates if the PK is different. At a technical, logical level, when looking at the data, can you define some expression which would identify the rows to be deleted? Or, conversely, the rows to be kept? If the former, that's your `DELETE` statement. If the latter, that's your `SELECT` statement at the start of Belgo's answer below. – David Oct 06 '17 at 19:16
  • @david, there is no primary key, come to find out, due to my importing them fresh and not into an existing table. I think that that I will simply re-import the table and be more careful. The duplication occurred when I tried to combine two tables into one and needless to say, that did not work. – Tim Wilcox Oct 06 '17 at 19:50

7 Answers7

22

May be it is too late, but I am usually doing this

; with cte(rownum)as(
    select row_number () over(partition by [Col1], [Col2] order by Col3) from [table]
)
delete from cte where rownum > 1
fly_ua
  • 1,034
  • 8
  • 12
2
DELETE FROM dbo.industry
WHERE COLUMN_NAME IN      -- Choose a column name
  (SELECT TOP 1000
     COLUMN_NAME,          -- Choose a column name
     ROW_NUMBER() OVER( ORDER by COLUMN_NAME ASC) AS Row_Number
   FROM dbo.industry
   WHERE Row_Number BETWEEN 475 AND 948 )

COLUMN_NAME can be any column name of your table u want.

Dr. X
  • 2,890
  • 2
  • 15
  • 37
1

If you are trying to delete using the Row_Number function, and you get an error of

Windowed functions can only appear in the SELECT or ORDER BY clauses

you can revise the SQL to have it in the select clause as in the example below:

Delete T 
From (Select Row_Number() Over(Partition By [IndustryType], [IndustryDescription] order By [ID]) As RowNumber,* 
From dbo.industry) T
Where T.RowNumber > 1
Kerry Jackson
  • 1,821
  • 12
  • 20
  • `Select Row_Number() Over(Partition By [IndustryType], [IndustryDescription] order By [ID]) As RowNumber,* From dbo.industry` throws an error for me in Snowflake and mySQL whereas `Select *, Row_Number() Over(Partition By [IndustryType], [IndustryDescription] order By [ID]) As RowNumber From dbo.industry` works. Strange! – Amit Pathak Mar 15 '22 at 10:42
1
ALTER TABLE dbo.industryADD COLUMN tmpRowNumber COUNTER
DELETE * FROM dbo.industry WHERE tmpRowNumber IS BETWEEN 475 AND 948
ALTER TABLE dbo.industry DROP COLUMN tmpRowNumber

Carefull though, your DBS may not consider row #1 as row #1 as tables are unsorted

Tingholm
  • 11
  • 2
0
SELECT DISTINCT *
INTO #Temp
FROM dbo.industry

DELETE FROM dbo.industry

INSERT INTO dbo.industry
SELECT *
FROM #Temp
BelgoCanadian
  • 893
  • 1
  • 11
  • 31
  • 1
    In the comments above the OP indicates that the table has a primary key. So this won't work. Every record will be re-inserted. – David Oct 06 '17 at 18:58
  • Agreed, but this might method can easily be tweaked to take that into account. The point is that it probably makes more sense to find the distinct values and recreate it than messing with row number matching. – BelgoCanadian Oct 06 '17 at 19:04
  • That's exactly the question... How to target the "distinct" records. The act of deleting them isn't the problem. It's a good attempt, but essentially you're hand-waving the actual problem. – David Oct 06 '17 at 19:05
0
DELETE FROM dbo.industry WHERE dbo.industry.
REPLACE WITH PK COLUMN NAME| IN (SELECT TOP 948 dbo.industry
REPLACE WITH  PK COLUMN NAME| FROM dbo.industry WHERE dbo.industry
REPLACE WITH  PK COLUMN NAME| > 475 ORDER BY dbo.industry 
REPLACE WITH  PK COLUMN NAME|)
Tim Wilcox
  • 1,275
  • 2
  • 19
  • 43
-4

This is not really an answer. There were a few issues with the data that made the answers above (while excellent) unrelated. I simply deleted the table and then re-imported it from fixed width. This time, I was more careful and did not have the duplication.

Tim Wilcox
  • 1,275
  • 2
  • 19
  • 43