1

I am trying to build a Lookup Data Structure in C# for huge data. The current plan is for it to able to scale to 1 Billion entities without affecting performance. The search performance should be in nanoseconds.

Currently, I have experimented with Lucene.Net and MongoDB. The problem with both of them is that they take hours to insert this much records. And then after that, their performance is in milliseconds.

On the other hand, I have tried using List and ConcurrentBag in C#. It satisfies the performance constraints but with 1 billion records the collection takes around 78GB memory in RAM.

Is there any better way to work around this?

Raheel
  • 163
  • 1
  • 6
  • 1
    Broadly speaking, no. You have to choose whether to get a machine that can either hold everything in memory, or commit to disk and take the performance hit. – spender Apr 30 '14 at 08:34
  • Not really, if you want the level of performance you are asking for you must sacrifice something – Sammaye Apr 30 '14 at 08:34
  • 1
    "The search performance should be in nanoseconds" == search from RAM – spender Apr 30 '14 at 08:36
  • You can look at Elastic Search : http://www.elasticsearch.org/ –  Apr 30 '14 at 08:40
  • 1
    @NeillVerreynne that still won't achieve nanoseconds – Marc Gravell Apr 30 '14 at 08:40
  • Putting that much RAM in a server is totally possible. If you look at the performance of SSDs and if you do the math, you'll see how you won't meet your perf goals. The working set is apparently 78GB. For any modern database, you'll need at least that much RAM for best perf. – WiredPrairie Apr 30 '14 at 10:55
  • And, without more details about the nature of the data, it's hard to say whether you could reduce the memory footprint in any significant way. – WiredPrairie Apr 30 '14 at 10:57
  • Can someone share any time/space comparison between tools/techniques (in context to .NET)? for benchmarking ? – Umer Apr 30 '14 at 13:02
  • Is it important for the database preparation time to be fast? You say MongoDB and Lucene.NET take hours, but isn't this just once-off? Finally, does it take hours when using [bulk insert](http://docs.mongodb.org/manual/core/bulk-inserts/)? – Simon MᶜKenzie May 01 '14 at 01:31
  • 1
    @SimonMᶜKenzie The Database preparation time should be acceptable. I assumed it shouldn't take more than 30 minutes for this amount of data. But for me it was taking more than 3 hours. And yes, I was doing it using bulk insert with each batch size of 50,000 records. – Raheel May 01 '14 at 06:49

0 Answers0