How to prevent duplicate downloads from SQL database by multiple clients?

Question

I'm developing an application using C# part of which will be downloading data from my SQL database, and send to the printer. When printing is complete, it will mark the data as already printed, so I won't be printing same data more than once.

It works fine with one computer/printer, but what if I have this application run on multiple client computers? It seems after one client download the data but before the data is marked as complete, another client will think it is not downloaded and download the same data.

Is there way to prevent this? I'm thinking maybe tag the data when being downloaded or make sure only 1 query can be executed at the same time?

Also mark it with a timestamp of when the print operation started, so nobody will wait a week for an in-progress print job that died. — glenebob, May 19 '15 at 14:08

Philip Kelley · Answer 1 · 2015-05-19T17:32:21.073

Add some form of “PrintStatus” column.

When data is added, it is set to (say) 1, “Not Printed.
When a client queries for data to print, only return data where PrintStatus = 1
When a client picks up the data to print, set it to (say) 2, “Being Printed”
When the client finishes printing, set the status to 3, “Printed”
If for some reason the client fails to print the data, set the status back to 1 (and PrintedAt back to null)

This leaves the janitor issue, aka what happens if printing fails and that fact is not reported (because the janitor pulled the plug on the server. Don’t laugh, it happened here, though it was a tech repair guy who really should have known better.) If you worry about such things:

Add some form of time logging, such as column “PrintedAt”, which is set to null whenever PrintStatus is set to 1, and set to getdate() when PrintStatus is set 2. Side benefit: now you know when (or perhaps "which printing") your data was printed!
PrintStatus gets set to 3 when printing is done. Thus, if PrintStatus = 2, printing has (so far) taken getdate() – PrintedAt time to run.
Have a regularly scheduled process (SQL Agent jobs are good for this) run periodically. It checks over the table, and if it finds any rows with PrintStatus = 2 where the PrintedAt column is greater than your tolerance (1 day? 2 hours?). Any such rows get “rolled back” by setting PrintStatus to 1 and PrintedAt to null.

This can seem like overkill, but you tell me how critical consistent data is to you.

Added:

Yes, concurrency and consistency is a problem. While you can mess around with BEGIN TRANSACTION ... COMMIT/ROLLBACK, I prefer leveraging the power of implicit transactions. Here's one way:

This sets up a testing table with some data:

--  Set up test data
CREATE TABLE Test
 (
   Data         int       not null identity(1,1)
  ,PrintStatus  tinyint   not null
  ,PrintedAt    datetime  null
 )

INSERT Test (PrintStatus) values
 (1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)

SELECT *
 from Test
GO

This bit contains an UPDATE statement that updates all rows to be printed, and uses the OUTPUT clause to store those items into a temporary table. The contents of that table are then returned to the printing routine.

--  First pass: Get everything to pring
DECLARE @ToBePrinted table
 (Data  int  not null)


UPDATE Test
 set
   PrintStatus = 2
  ,PrintedAt = getdate()
 output inserted.Data into @ToBePrinted (Data)
 where PrintStatus = 1


SELECT Data PrintThese
 from @ToBePrinted


SELECT * from Test

GO

Here, we add three more rows of data, then run the routine again. Old items are not selected, new ones are.

--  Second pass: Add three new items, they (and only they) get selected
INSERT Test (PrintStatus) values
 (1),(1),(1)

SELECT *
 from Test


DECLARE @ToBePrinted table
 (Data  int  not null)


UPDATE Test
 set
   PrintStatus = 2
  ,PrintedAt = getdate()
 output inserted.Data into @ToBePrinted (Data)
 where PrintStatus = 1


SELECT Data PrintThese
 from @ToBePrinted


SELECT * from Test

This works because of the ACID properties of SQL Server--specifically I)solation. Once the one statement beings to modify the contents of the table, no other statement can access the data being modified until the execution of the statement has been completed. (Unless you're fool enough to use NOLOCK. Do not use NOLOCK.)

You should add protection against concurrent requests. In this case, the solution does not specify how to ensure that one, and only one, process can "pick up the data to print". As written, there is a concurrency weakness built into this. — Pittsburgh DBA, May 19 '15 at 14:30
Thanks for your response! My concern now is what if say client A grab a "Not Printed" row, and before client A mark that row as "Being Printed", client B kicks in and grab the same row? Is the a way in SQL to mark that row when selected? I was reading some documentation on "select...for update". But it doesn't seem to prevent "select" — user2612061, May 19 '15 at 14:38
I've added one way to do this, with a brief explanation. For a full understanding, google "database acid" and read a few of the bazillion articles on the topic. (Doing this is well worth your time!) — Philip Kelley, May 19 '15 at 17:34
(1 month later) Note to self: think twice before providing long and detailed answers to new SO members — Philip Kelley, Jun 18 '15 at 22:17

score 0 · Answer 2 · answered May 19 '15 at 14:11

0

I think you're on the right track. Either change your boolean/bit IsComplete field to a status filed (with values like "New", "In Progress", and "Complete") or add a new bit field called IsInProgress. Then only download the ones that aren't already either InProgress or Complete.

If there is a way downloading or printing could fail, make sure you have logic to set the row back to a New status so that it will be picked up and tried again later. Or, perhaps you'd want to mark it as a different Error status instead to be handled as part of an exception process.

answered May 19 '15 at 14:11

jtower

56
5

How about two concurrent connections racing to get the same row? This needs to be beefed up with concurrency in mind. – Pittsburgh DBA May 19 '15 at 14:31
Hi Pittsburgh DBA, I think what you are talking about is my biggest concern now, what would you do to prevent the racing connections from getting the same row? – user2612061 May 19 '15 at 14:44
I'd recommend doing something like this: update the rows you want to process by using a new field (maybe BatchGUID) with a guid/uniqueid and using WITH (ROWLOCK) so no other process can read it while you're updating it. Then, read those rows in using the same GUID and process them. – jtower May 20 '15 at 15:25

How to prevent duplicate downloads from SQL database by multiple clients?

2 Answers2