I have a client/server architecture written in .NET where there are hundreds of clients sending data to one server. Each item has an id and it is possible for different clients to send the same id multiple times.
The ids are longs and the server needs to know if it has already recieved something with the same id. Every day the server will receive about 10,000,000 ids with ~ 1,000,000 duplicates. Everytime it receives an id it will need to make some sort of lookup to see whether it has already been dealt with. It is extremely unlikely to get a duplicate id after a few days.
My current ideas for solutions are:
In memory dictionary of ids with a background thread to remove any items after they have been in the dictionary for over 3 days.
MySql database with one indexed column for ids and a column for insertion date.
The issues I forsee are what sort of speed will a query be to the MySql database because I have to do ~ 10,000,000 queries a day. I am not going to be using fancy hardware for this particular issue (typical development system) and don't want to tax it 100%. The problem with the in memory solution is it will be a hassle to write the background worker (concurrency) and everything is lost in an unlikely but possible crash.