4

I need to write a c# service ( could be a windows service or a console app) that needs to process large amounts of data ( 100 000 records) stored in a database. Processing each record is also a fairly complex operation. I need to perform a lot of inserts an updates as part of the processing.

We are using NHibernate as the ORM.

One way is to load all the records and process them sequentially... which could turn out to be quite slow. I was looking at multi threading options and was thinking of having multiples threads processing chunks of records simultaneously .

Could anyone give me some pointers on how I should approach this.. considering that I'm using NHibernate and what are the possible gotchas like deadlock etc

Thanks a lot.

Sennin
  • 1,001
  • 2
  • 10
  • 17

4 Answers4

2

You should consider Task Parallel Library.

Pradeep
  • 3,258
  • 1
  • 23
  • 36
2

Assuming you are using .NET 4.0, you can use the Task Parallel Library (as has been mentioned) to do something like this:

Parallel.ForEach(sourceCollection, item => Process(item));

Your source collection would be an IEnumerable of the loaded records. The library will handle everything for you:

The source collection is partitioned and the work is scheduled on multiple threads based on the system environment. The more processors on the system, the faster the parallel method runs.

It may help to read a tutorial on using Parallel.ForEach(). Also, be aware of potential pitfalls.

Jon
  • 4,925
  • 3
  • 28
  • 38
  • Thanks Jon and all you guys for your suggestions. I'm just looking at the task parallel library. I'm wondering how I would manage NHibernate sessions here as I have no control over the parallel threads . Ideally I would want each parallel thread to have its own session .. any thoughts on how that can be accomplished with TPL ? – Sennin Mar 17 '11 at 05:58
0

Sounds like PLINQ is the best solution (Chapter 5 in this article). But as each calculation is working a lot with database, you should create separate session for each thread.

Alex Zhevzhik
  • 3,347
  • 19
  • 19
0

Use IStatelessSessions if possible and experiment with the adonet.batch_size property.

Also how performant does it need to be? I'm a fan of NH but this is one scenario where stored procedures might be better

Jason Freitas
  • 1,587
  • 10
  • 18
  • 1
    High performance is required as always :).. however I'm not too keen on moving all the business logic to a sproc – Sennin Mar 17 '11 at 05:53