1

I have a database with about 5 million rows in it. I am trying to generate XML strings for the database and push them to a service. Instead of doing this one at a time, the service supports taking 1000 records at a time. At the moment, this is quite slow, taking upwards of 10 seconds per 1000 records (including writing back to the database and uploading to the service).

I tried to get the following code working, but have failed... I get a crash when I try it. Any ideas?

    var data = <insert LINQ query here>
    int take = 1000
    int left = data.Count();

    Parallel.For(0, left / 1000, i =>
        {
            data.Skip(i*1000).Take(1000)...
            //Generate XML here.
            //Write to service here...
            //Mark items in database as generated.
        });
        //Get companies which are still marked as not generated.
        //Create XML.
        //Write to Service.

I get a crash telling me that the index is out of bounds. If left is 5 million, the number in the loop should be no more than 5000. If I multiply that again by 1000, I should not get more than 5 million. I wouldn't mind if it worked for a bit, and then failed, but it just fails after the SQL query!

Cœur
  • 37,241
  • 25
  • 195
  • 267
TiernanO
  • 1,597
  • 14
  • 31
  • 1
    An "Index out of bounds" exception tells me that the problem is probably not directly in the code you have included. Can you include a little more information about the data query, and what you do after the skip/take? If possible, simplify your own code until you have a bare-minimum example of what causes the exception, and then post that. Also tell us which line the error is actually occurring at. – StriplingWarrior Feb 03 '11 at 17:33
  • Take a look at my answer - you have a bug in your code – Andrey Feb 03 '11 at 17:41
  • Thanks lads for the comments. the error was occuring when i was doing the take (data.Skip(i*1000).Take(1000). seems that @Andrey found a bug in my code... see answer below... – TiernanO Feb 04 '11 at 09:37

2 Answers2

3

I think it doesn't like your last index value - it should be left / 1000 -1, not left / 1000:

Parallel.For(0, left / 1000 - 1, i =>
        {
            data.Skip(i*1000).Take(1000)...
            //Generate XML here
            //Write to Service here...
            //mark items in DB as generated
        });
Andrey
  • 20,487
  • 26
  • 108
  • 176
  • Doh! thanks for the tip. I have made code changes to use the Custom Partitions as that seems to be a little more clean than my original code. it also takes into account (by the looks of things) the remainder, which i was fixing after... – TiernanO Feb 04 '11 at 09:35
2

I suspect the index out of bounds error is caused by code other than what is currently being displayed.

That being said, this could be handled in a much cleaner manner. Instead of using this approach, you should consider switching to using a custom partitioner. This will be dramatically more efficient, as each call to Skip/Take is going to force a re-evaluation of your sequence.

Reed Copsey
  • 554,122
  • 78
  • 1,158
  • 1,373
  • Ok, so other than my code being a little wrong (woops!) seems that this custom partitioner is the way to go. i think i still need to use takes, but it does work the way i want it to work in early testing. waiting on a DB rebuilt so only testing with small amounts of data, but i think this is all good! Thanks! – TiernanO Feb 04 '11 at 09:32