2

I have a linq subquery as follows:

        for (int i = 0; i < parentAttributes.Length; i++)
        {
            Guid parent = parentAttributes[i];
            var subQuery = from sc in db.tSearchCluendexes
                           join a in db.tAttributes on sc.AttributeGUID equals a.GUID
                           join pc in db.tPeopleCluendexes on a.GUID equals pc.AttributeGUID
                           where a.RelatedGUID == parent && userId == pc.CPSGUID                             
                           select sc.CPSGUID;

            query = query.Where(x => subQuery.Contains(x.Id));
        }

The basic idea is that I filter a list of data (as represented by the query variable) based on the results of a subquery executed an indeterminable number of times.

However I now have an issue whereby I only want to execute the where on the query if the subquery actually contains data and I don't want to execute the subquery within each iteration of the loop (for example by using a Count() or a ToList()) as this will hit the remote resource. Something like below if we take it that a method HasData() actually returns true or false:

        for (int i = 0; i < parentAttributes.Length; i++)
        {
            Guid parent = parentAttributes[i];
            var subQuery = from sc in db.tSearchCluendexes
                           join a in db.tAttributes on sc.AttributeGUID equals a.GUID
                           join pc in db.tPeopleCluendexes on a.GUID equals pc.AttributeGUID
                           where a.RelatedGUID == parent && userId == pc.CPSGUID                             
                           select sc.CPSGUID;

            if (subQuery.HasData())                
               query = query.Where(x => subQuery.Contains(x.Id));
        }

Suggestions are welcome.

kh25
  • 1,238
  • 1
  • 15
  • 33

2 Answers2

4

Your suggestions of using Count and ToList are not the fastest way to see if a query has any results. You should use the Any method:

if (subQuery.Any())
{            
   query = query.Where(x => subQuery.Contains(x.Id));
}

Note that this will still be evaluated immediately. If you want to delay execution, you should put it inside the lambda function.

query = query.Where(x => !subQuery.Any() || subQuery.Contains(x.Id));
Mark Byers
  • 811,555
  • 193
  • 1,581
  • 1,452
  • 1
    There is no benefit from putting the Any within the lambda expression – Aducci Mar 23 '12 at 19:10
  • Thanks I'll try this tomorrow. With the Any() in the lambda will the execution be deferred or will it make no difference as per the comment from Aducci? – kh25 Mar 24 '12 at 14:57
  • @kh25: I don't thick Aducci said it would make no difference. He said only that there would be "no benefit". I'm not entirely sure what he means by that. Perhaps you could test it and let us know? – Mark Byers Mar 24 '12 at 21:51
  • Mark your solutution looks good. However I can't see any difference or benefit in having it inside the Lambda or not. I'm also not able to tell conclusively whether the execution has been deferred or otherwise with the any clause (I'm using fiddler but it shows no hits to the remote database either way). Any suggestions on how to tell for definite whether the db has been hit? – kh25 Mar 25 '12 at 14:42
  • @kh25: What do you mean by the "remote" database? Do you have more than one database? Can you enable the database profiler and step through the code to see when the SQL is fired? Also, if you can't see any difference in performance either way, do you really need to worry about it? – Mark Byers Mar 25 '12 at 18:51
  • Mark by remote I mean the database is hosted on a different server from where this code will be run. Unfortunately I don't have any access to enable the database profiler - it's a third party db. Finally I can't use the code above if it does make a request to the database everytime it's executed because it's in a loop and could potentially be ran dozens of times in whcih case you'll see a drastic reduction in performance if it goes off and makes a request to the db each time. I'll do some investigation later and see what's happening when it's run. – kh25 Mar 26 '12 at 08:58
0

After some very detailed checking of Mark's response I can confirm that the Any() should be in the lambda expression as otherwise the code will make a call directly to the database during every iteration of the loop (not desirable).

So for completeness sake here is the code I'd suggest using if you have a similar problem:

        for (int i = 0; i < parentAttributes.Length; i++)
        {
            Guid parent = parentAttributes[i];
            var subQuery = from sc in db.tSearchCluendexes
                           join a in db.tAttributes on sc.AttributeGUID equals a.GUID
                           join pc in db.tPeopleCluendexes on a.GUID equals pc.AttributeGUID
                           where a.RelatedGUID == parent && userId == pc.CPSGUID                             
                           select sc.CPSGUID;


            query = query.Where(x => !subQuery.Any() || subQuery.Contains(x.Id));                
        }

This code will concatenate a series of exist statements to the sql that will eventually be called at a later stage in the process. If you have the Any() outside the lambda these SQL exists are all called individually.

Thanks again Mark - top man.

kh25
  • 1,238
  • 1
  • 15
  • 33