.NET or MySql or other solution for millions of lookups a day (to stop duplicates)

Question

I have a client/server architecture written in .NET where there are hundreds of clients sending data to one server. Each item has an id and it is possible for different clients to send the same id multiple times.

The ids are longs and the server needs to know if it has already recieved something with the same id. Every day the server will receive about 10,000,000 ids with ~ 1,000,000 duplicates. Everytime it receives an id it will need to make some sort of lookup to see whether it has already been dealt with. It is extremely unlikely to get a duplicate id after a few days.

My current ideas for solutions are:

In memory dictionary of ids with a background thread to remove any items after they have been in the dictionary for over 3 days.
MySql database with one indexed column for ids and a column for insertion date.

The issues I forsee are what sort of speed will a query be to the MySql database because I have to do ~ 10,000,000 queries a day. I am not going to be using fancy hardware for this particular issue (typical development system) and don't want to tax it 100%. The problem with the in memory solution is it will be a hassle to write the background worker (concurrency) and everything is lost in an unlikely but possible crash.

This sounds like a good candidate for [memcached](http://memcached.org/) or similar -- just keep in mind such systems are what they are, *caches*. (That is, it may still need a *reliable* backing source, perhaps a database although a more lightweight persistent key-value store may be sufficient). — , Sep 02 '11 at 00:40

score 0 · Answer 1 · answered Sep 02 '11 at 00:43

0

Not sure about the MySQL part - usually it scales good with the HW you use...

For the Dictionary part just use a ConcurrentDictionary - this is thread-safe and really fast since most operations are implemented lock-free.

answered Sep 02 '11 at 00:43

Yahia

69,653
9
115
144

score 0 · Answer 2 · edited May 23 '17 at 12:12

You could try a key value store.

Performance removing stale keys (ids) might be an issue, since you'd need to lookup each value (insertion date), but it should be easy enough to test. It should also be pretty simple to test if you need a cache sitting between the store and the server.

Aside from the projects in the link above, you could consider Berkeley DB, which has a C# API and includes an in-memory cache.

.NET or MySql or other solution for millions of lookups a day (to stop duplicates)

2 Answers2