10

I am using MS SQL database and I have a table named "Logs" that looks like this:

enter image description here

Records are added here one after another so the table becomes quite big after few days/weeks.

What I need to do periodically is a little bit of cleanup. I.e. I need an SQL query that would delete older rows and keep only the most recent 100 records in this table.

I agree it would be better to have it "delete records older than some_date"... It was like this before, but the client wants it different :( So... here we are.

BTW: I'm a little bit disappointed about people throwing negatives on this question. Is there something wrong with it or what?... Imagine: this question produced already 4 answers!!! ... and one guy decided to throw negative on that! Now I really don't know what to think... Strange people around here :(

Salman A
  • 262,204
  • 82
  • 430
  • 521
סטנלי גרונן
  • 2,917
  • 23
  • 46
  • 68
  • what do you need to do? delete old records or delete all records except the most recent 100? what happens if you get 101 records in a single second? does that mean you will then miss out on the first record? – Kritner Jul 06 '15 at 19:07
  • Is there an ID or anything that makes row unique? – J.S. Orris Jul 06 '15 at 19:08
  • 1
    disc space is so cheap these days I would argue with rows of that size, nothing would ever need to be deleted. If you ever ***did*** need to delete, I would not delete based on "leave this many records remaining", but rather "delete records older than x date." – Kritner Jul 06 '15 at 19:09
  • @Kritner : I don't understand your questions, but all I can say is: Usually there's a cleanup job that occurs somewhere at 3:00am, normally no new records arrive at that hour... And, yes I agree records should be deleted based on date rather than number :) – סטנלי גרונן Jul 07 '15 at 11:05

5 Answers5

17

You can use one of the following:

-- offset clause
WITH goners AS (
    SELECT *
    FROM Logs
    ORDER BY DateTime DESC
    OFFSET 100 ROWS 
)
DELETE FROM goners

-- numbered rows
WITH goners AS (
    SELECT ROW_NUMBER() OVER(ORDER BY DateTime DESC) AS rn, Logs.*
    FROM Logs
)
DELETE FROM goners
WHERE rn > 100

-- nth value
-- note that this "keeps" all rows that tie for last place
DELETE FROM Logs
WHERE DateTime < (
    SELECT MIN(DateTime)
    FROM (
        SELECT TOP 100 DateTime
        FROM Logs
        ORDER BY DateTime DESC
    ) AS x
)
Salman A
  • 262,204
  • 82
  • 430
  • 521
  • Couldn't the latter work with the [DateTime] column as is too? – Matt Jul 06 '15 at 19:37
  • @SalmanA : What if there are more records with DateTime = MIN(DateTime) ...? I guess it is possible to have less than 100 rows after executing your query (the first one I'm talking about). Right? – סטנלי גרונן Jul 07 '15 at 16:47
  • @groenhen: yes, the first approach does not guarantee that exactly 100 records will be kept. It is possible to have **more** than 100 rows (since we are using `<`) when e.g. row #100, 101 and 102 have same datetime. – Salman A Jul 07 '15 at 19:11
  • @groenhen see revised answer – Salman A Jul 07 '15 at 19:40
4

While I agree with others that this is probably not the way to go, here's a way to do it anyway:

;WITH keepers AS
(   SELECT TOP 100 [DateTime]
    FROM dbo.Logs
    ORDER BY [DateTime] DESC )
DELETE FROM dbo.Logs a
WHERE NOT EXISTS ( SELECT 1 FROM keepers b WHERE b.[DateTime] = a.[DateTime] )
Matt
  • 1,115
  • 13
  • 29
3

Instead of using NOT EXISTS, just use >=:

WITH keepers AS (
    SELECT TOP 100 [DateTime]
    FROM dbo.Logs
    ORDER BY [DateTime] DESC
   )
DELETE FROM dbo.Logs a
    WHERE l.DateTime < (SELECT MIN([DateTime]) FROM keepers);

I'm not sure if there are lock settings where new rows could be added in while the delete is running. If so, this would still be safe for that.

You can actually simplify this in SQL Server 2012+:

DELETE FROM dbo.Logs a
    WHERE l.DateTime < (SELECT [DateTime] 
                        FROM dbo.logs
                        ORDER BY [DateTime]
                        OFFSET 99 FETCH FIRST 1 ROW ONLY
                       );
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Ok, it works after some minor corrections (missing parenthesis, etc.) – סטנלי גרונן Jul 07 '15 at 16:40
  • Would this work when the amount of data is about 40GB, I inherited a huge data where maximum is a single log table never cleaned for 2yr. Now have to clean this and save only a month data, moving rest to archive (via another copy) – NitinSingh Dec 18 '19 at 08:12
  • @NitinSingh . . . Yes. You would want indexes to speed the queries and if many rows are being deleted, you might want to take a different approach. – Gordon Linoff Dec 18 '19 at 11:31
1

This works for me:

;with cte as(select top(select count(*) - 100 from table) * from table order by dt)
delete from cte
Giorgi Nakeuri
  • 35,155
  • 8
  • 47
  • 75
0
DECLARE @cutoff DATETIME
SELECT TOP 100 @cutoff = [DateTime] FROM Logs ORDER BY [DateTime] DESC
DELETE FROM Logs WHERE [DateTime] < @cutoff
Fruitbat
  • 764
  • 2
  • 5
  • 19