1

I have a thread that periodically checks one of my MS SQL tables for any records that have their "Processed" bit field set to 0. The thread then performs some code using those records and then sets their Processed bits to 1; using it basically as a queue. The Linq query that I'm using to retrieve those records spans multiple lines and is quite complicated (for reasons unimportant to the question), so here is a very simplified version:

var RecordsToProcess = MyTable.Where(i => i.Processed == 0); // Very simplified

I need to wait until all the records have been processed before continuing, so I would like to use something like this:

while (RecordsToProcess.Count() > 0)
{
    System.Threading.Thread.Sleep(1000);
}

The problem is that while the thread does in fact process the records and sets their Processed bit to 1, the value of RecordsToProcess.Count() in the condition statement never decreases and thus we get an infinite loop. My guess is that calling .Count() stores that integer in memory, and then each iteration of the loop looks at that value instead of querying the database to get the current count. I can get the behavior I want by moving the query into the condition statement like so:

while (true)
{
    if (MyTable.Where(i => i.Processed == 0).Count() > 0)
        System.Threading.Thread.Sleep(1000);
    else
        break;
}

Since the query I'm actually using is much more complicated than the one in this example, doing it this way makes it hard to read. Is there something I can use that's similar to RecordsToProcess.Count() > 0, but that queries that database each iteration instead of using the initial count that was stored in memory (assuming I'm correct on that)?

Note: I wouldn't normally use a potentially dangerous while loop like that, but I'll only need to run this page a maximum of 4 or 5 times and then never again. So I'm not too worried about it.

Jaimie Knox
  • 167
  • 2
  • 4
  • 13
  • "My guess is that calling .Count() stores that integer in memory" -- IIRC the first time you call `RecordsToProcess.Count()`, `RecordsToProcess` is evaluated, and the result of that evaluation is cached. So in effect, you're right. My solution would be `var count = MyTable.Where().Blah().Blah().Long().Query()` with appropriate newlines etc for readability, and then `if (count > 0) { ... }` -- or hide the LINQ in a method, `GetProcessedRecords()`. – 15ee8f99-57ff-4f92-890c-b56153 Jun 24 '16 at 20:09
  • Everything ultimately depends on your `RecordsToProcess` query. You should show us what that query is. You probably wrote it in such a way that it is already evaluated. – Jeff Mercado Jun 24 '16 at 20:25
  • Of course I don't see the rest of your code, but can't you just wait for the thread to finish? Or not even do it in another thread? – Gert Arnold Jun 24 '16 at 21:13

2 Answers2

4

Edited original post based on the comments.

I believe part of the issue is how the compiler optimizes loops.

It is likely something in your query that is caching data. If the whole query used lazy evaluation, except the Count being checked in the loop, every time you called Count on the query it would be re-evaluated. In your second example, the entire query is in the loop and thus has to be re-evaluated each time, regardless of whether or not is actually uses lazy evaluation. I would check the remarks in the MSDN documention on the operators you're using.

I would also suggest using Any instead of Count in this situation for performance and clarity. Depending on what you're iterating, Count will usually iterate over a collection to see how many elements, but Any is lazier. In LINQ to Object, Count() is optimized for sequences implementing ICollection, to use the Count property, which is much faster than iterating, and Any() stops checking after it finds 1 element. As suggested by Erik below, In LINQ to SQL there is likely something like a TOP 1 added to the SELECT statement. I would assume SQL has it's own COUNT optimization, but I haven't done any research.

Using Any() when appropriate can also help readability by getting rid of the operator in Count() > 0, and more clearly expresses that you're interested in a bool not an int.

I would implement your method like this:

var query = MyTable.Where(i => i.Processed == 0);
while(true) {
    if (!query.Any()) break;
    Thread.Sleep(1000);
}

or better yet this, if you can get it to execute lazily:

var query = MyTable.Where(i => i.Processed == 0);
while(query.Any()) { Thread.Sleep(1000); }

However, as mentioned by other answers, more information about how your query is constructed would be helpful.

JamesFaix
  • 8,050
  • 9
  • 37
  • 73
  • 1
    *Count must iterate a collection to see how many records* is not always true, [it depends on the collection](http://stackoverflow.com/a/7969468/209259). It terms of Linq2Sql and EF, I'm pretty confident that `Any()` changes the select to a `top 1` then just checks to see if there is or isn't a record (and might even just do `select 1 from..` to minimize network traffic. – Erik Philips Jun 24 '16 at 20:22
  • 2
    I doubt the compiler would make such an optimization. That would be an unsafe change. The condition of the while loop is _always_ evaluated, it's not going to cache it like you're suggesting. – Jeff Mercado Jun 24 '16 at 20:26
0

You are not refreshing the RecordsToProcess variable on each loop

While(RecordsToProcess.Count() > 0)
{
  System.Threading.Thread.Sleep(1000);
  RecordsToProcess = MyTable.Where(i => i.Processed == 0);
}