How do I setup a daily archive job in SQL server to keep my DB small and quick?

Question

I have a DB in SQL server and one of the tables recieves a large amount of data every day (+100 000). The data is reported on, but my client only needs it for 7 days. On the odd occasion he will require access to historic data, but this can be ignored.

I need to ensure that my primary lookup table stays as small as can be (so that my queries are as quick as possible), and any data older than 7 days goes into a secondary (archiving) table, within the same database. Data feeds in consistently to my primary table throughout the day from a variety of data sources.

How would I go about performing this? I managed to get to the code below through using other questions, butI am now recieving an error ("Msg 8101, Level 16, State 1, Line 12 An explicit value for the identity column in table 'dbo.Archived Data Import' can only be specified when a column list is used and IDENTITY_INSERT is ON. ").

Below is my current code:

    DECLARE @NextIDs TABLE(IndexID int primary key)
DECLARE @7daysago datetime
SELECT @7daysago = DATEADD(d, -7, GetDate())

WHILE EXISTS(SELECT 1 FROM [dbo].[Data Import] WHERE [Data Import].[Receive Date] < @7daysago)
BEGIN 
    BEGIN TRAN 

    INSERT INTO @NextIDs(IndexID)
        SELECT TOP 10000 IndexID FROM [dbo].[Data Import] WHERE [Data Import].[Receive Date] < @7daysago

    INSERT INTO [dbo].[Archived Data Import]
        SELECT * 
        FROM  [dbo].[Data Import] AS a
        INNER JOIN @NextIDs AS b ON a.IndexID = b.IndexID

    DELETE [dbo].[Data Import]
    FROM  [dbo].[Data Import] 
    INNER JOIN @NextIDs AS b ON a.IndexID = b.IndexID 

    DELETE FROM @NextIDs

    COMMIT TRAN
END

What am I doing wrong here? Im using SQL server 2012 Express, so cannot partition (which would be ideal).

Beyond this, how do I turn this into a daily recurring task? Any help would be much appreciated.

If you do some research on `IDENTITY_INSERT` you'll find your answer. You need to add this code to your SP to tell it not to autogenerate. Alternatively remove the IDENTITY property and you won't need to. — Nick.Mc, Oct 15 '14 at 10:06

score 1 · Accepted Answer · edited Oct 15 '14 at 11:44

1

An explicit value for the identity column in table 'dbo.Archived Data Import' can only be specified when a column list is used and IDENTITY_INSERT is ON

So... set identity insert on. Also, use DELETE ... OUTPUT INTO ... rather than SELECT -> INSERT -> DELETE.

DECLARE @7daysago datetime
SELECT @7daysago = DATEADD(d, -7, GetDate());
SET IDENTITY_INSERT [dbo].[Archived Data Import] ON;

WITH CTE as (
    SELECT TOP 10000 *
    FROM [dbo].[Data Import] 
    WHERE [Data Import].[Receive Date] < @7daysago)
DELETE CTE
  OUTPUT DELETED.id, DELTED.col1, DELETED.col2, ... 
  INTO  [dbo].[Archived Data Import] (id, col1, col2, ....);

Beyond this, how do I turn this into a daily recurring task?

Use conversation timers and activated procedures. See Scheduling Jobs in SQL Server Express.

edited Oct 15 '14 at 11:44

Chen_Dogg

91
1
12

answered Oct 15 '14 at 10:07

Remus Rusanu

288,378
40
442
569

This worked like a dream. in processing chunks of 1 million the queries run over about 10 minutes. It is a very wide table though, about 35 coloumns, mostly varchar(200)'s. – Chen_Dogg Oct 15 '14 at 11:45
1

1 mil is a bit big chunk for my taste, but is your call. I suggest you play with the size and find a sweep spot. Make sure the DB log file is big enough to accommodate the chunk transaction w/o causing (incremental) log growth. – Remus Rusanu Oct 15 '14 at 11:48
Yup, fair comment. Hit my log limit, so some adjustments are taking place. Thanks for the help. – Chen_Dogg Oct 16 '14 at 08:01

score 0 · Answer 2 · answered Oct 15 '14 at 10:00

0

Without seeing your Table definitions, I am going to assume that your archive table has the same definition as your current table. Am I right in assuming that You have an identity column as Archived Data Import.IndexID? If so, switch it to ba an int large enough to hold expected values.

In order to schedule, this you will need to create a bat file to run this procedure and schedule it with windows scheduler.

answered Oct 15 '14 at 10:00

DB101

633
4
8

Yes, my apologies. My Archived Table is identical but contains an additional coloumn that is ArchiveID, basically the new primary key. Will try change that Int quick. – Chen_Dogg Oct 15 '14 at 10:07
You should also specify all your column names in the Select and Insert clauses. This would be good practice for a variety of reasons especially when there are differences between the tables. – DB101 Oct 15 '14 at 10:09
Hmmm... It is an int, which allows it to go up to 2 147 483 647 which I am nowhere near at this stage. I tried changing it to a string (varchar 50) but still the same error. – Chen_Dogg Oct 15 '14 at 10:12
Leave it as an int, just remove the identity property. That is the part that is causing the issue. – DB101 Oct 15 '14 at 10:14
The following thread should help you schedule your procedure via the scheduler: http://stackoverflow.com/questions/14321023/executing-a-stored-procedure-using-windows-task-scheduler – DB101 Oct 15 '14 at 10:22
The following thread will help you remove the identity property: http://sqlmag.com/t-sql/removing-identity-property-existing-column – DB101 Oct 15 '14 at 10:24

How do I setup a daily archive job in SQL server to keep my DB small and quick?

2 Answers2