0

I am tasked to create a stored procedure for a publisher application to retrieve employee data from our SQL Server.

The aim: We have EmpTable (lets call this the source). We also have EmpData table. It has a column called 'Status' with a default value: 'UNPROCESSED'. Our aim is to create a SP(lets call this SP1) so that: it polls all the data from the EmpTable in batches of 100 rows. It should then continue polling data from this table until SP returns zero records. Once the above completed processing, another SP(lets call this SP2) is created/called to change the status to 'COMPLETED'. If the record processing is failed permanently in the app, that is due to errors such as schema or validation errors (none retryable), SP2 to change the status to FAILED. The results is then populated in the EmpData table We aim to run this batch job once a day. Hope this make sense

I am wondering how this can be queried. I started the query:

DECLARE @id_check INT
DECLARE @batchSize INT
DECLARE @results INT

SET @results = 1 --stores the row count after each successful batch
SET @batchSize = 100 --How many rows you want to operate on each batch
SET @id_check = 0 --current batch 

-- when 0 rows returned, exit the loop
WHILE (@results > 0) 
BEGIN
   SELECT * -- This is just an example to generalize result for now
   FROM empdata

   SET @results = @@ROWCOUNT

   -- next batch
   SET @id_check = @id_check + @batchSize
END

The result I am aiming for is to return batch 1 to return 100 values, then batch 2 to return the next 100 and so on

Any help would be appreciated!

BP1109
  • 31
  • 3
  • 2
    So you are going to call the SP externally multiple times? Or you are going to query within the SP multiple times? I guess you must be calling the SP multiple times, because running the query multiple times in the SP is not going to return to your calling app until its finished. – Dale K Aug 19 '22 at 00:34
  • you may want to use triggers instead of trying to poll. here's some reference documentation on it: https://learn.microsoft.com/en-us/sql/t-sql/statements/create-trigger-transact-sql?view=sql-server-ver16 – Taugenichts Aug 19 '22 at 01:06
  • Hi Dale, yeah SP will be called externally multiple times until all data within the table is retrieved. – BP1109 Aug 19 '22 at 01:07
  • Does the polling query have a `WHERE` clause? i.e. is it pulling unprocessed records or similar? Because if so, all you need to provide is the batch size and keep going until no records are returned based on the `WHERE` clause. – Dale K Aug 19 '22 at 01:14
  • @DaleK - only WHERE clause indicated is within SELECT statement of the emp data to only include active employees. Ideally yes it should pull only unprocessed records Are you able to provide me with the query you think will suit this? Appreciate your help! Taugenichts - what was asked for to do is to poll data unfortunately and with my little knowledge in sql, im not confident with triggers either Thanks! – BP1109 Aug 19 '22 at 01:48
  • I can't write a query because you haven't shown your requirements fully yet... how do you know whether a record has been processed or not? Are you saying that every time you run this batch job you want to retrieve all active employees and that you run this batch job on a regular schedule? What do you actually do the the records you process? – Dale K Aug 19 '22 at 01:50
  • Sorry if it wasn't clear, here is what the aim so far: We have EmpTable (lets call this the source). We also have EmpData table It has a column called 'Status' with a default value: 'UNPROCESSED'. Our aim is to create a SP(lets call this SP1) so that: it polls data from the EmpTable in batches of 100 rows. It should then continue polling data from this table until SP returns zero records. That is until all available records are retrieved. – BP1109 Aug 19 '22 at 02:47
  • Once the above completed processing, another SP(lets call this SP2) is created/called to change the status to 'COMPLETED'. If the record processing is failed permanently in the app, that is due to errors such as schema or validation errors (none retryable), SP2 to change the status to FAILED. The results is then populated in the EmpData table We aim to run this batch job once a day. Hope this make sense – BP1109 Aug 19 '22 at 02:47
  • Firstly, [edit] all clarifications directly into your question. Secondly your architecture as described seems a bit odd... SP1 is used only to pull records? But you call it until all records are loaded into your app? That pretty much defeats the purpose of batch loading? Surely you should load a batch, fully process them, then load your next batch? – Dale K Aug 19 '22 at 03:28
  • Further you still haven't explained how your query in SP1 knows what records to pull..., guessing, based on what you've said, you want to pull all EmpTable records which don't have a corresponding entry in EmpData? – Dale K Aug 19 '22 at 03:29
  • Thanks @DaleK! This is what I am after so far. Appreciate your help on this! – BP1109 Aug 23 '22 at 02:18

1 Answers1

0

Unfortunately without clear requirements its hard to assist you.

However, if what you want is to pull all the matching records, but in batches (which doesn't make a lot of sense), then you can use the following stored procedure.

It will return the first 100 rows which meet whatever criteria you have. Then when they are loaded in your app, you call the SP again, passing in the maximum ID you received in the previous recordset.

Your app continues to call this SP until no rows are returned.

CREATE OR ALTER PROCEDURE dbo.MyTestProcedure1
(
    -- Pass in the last ID processed
    @LastIdChecked int = 0
)
AS
BEGIN
    SET NOCOUNT, XACT_ABORT ON;

    SET @LastIdChecked = COALESCE(@LastIdChecked,0);

    DECLARE @BatchSize int = 100;

    -- Return the next @BatchSize records after the last Id checked
    SELECT TOP(@BatchSize) *
    FROM dbo.EmpTable
    WHERE Id > @LastIdChecked
    ORDER BY Id ASC;

    RETURN 0;
END;

A more expected process would be, you use the SP to pull your first 100 records, then you fully process them in your app. Then you call the SP again, and the SP knows which records have been processed and filters them out in the WHERE clause. Then you run this until completion.

That solution would look like this:

CREATE OR ALTER PROCEDURE dbo.MyTestProcedure2
AS
BEGIN
    SET NOCOUNT, XACT_ABORT ON;

    DECLARE @BatchSize int = 100;

    -- Return the next @BatchSize records
    SELECT TOP(@BatchSize) *
    FROM dbo.EmpTable T
    WHERE {Record is unprocessed}
    -- e.g. if you are expeceting 1 record a day per dbo.EmpTable record.
    -- WHERE NOT EXISTS (
    --    SELECT 1
    --    FROM dbo.EmpData D
    --    WHERE T.EmpId = D.EmpId AND D.Date = CONVERT(date, GETDATE())
    --)
    ORDER BY Id ASC;

    RETURN 0;
END;
Dale K
  • 25,246
  • 15
  • 42
  • 71