When finding and using a single result in Iinq using ToArray() huge performance increase

Question

I had some code for processing XML file that was running pretty slowly. The offending code is below.
The first query should get either zero or one record from a set of about 1,000 records. If I get one, I then pull from another set (holding about 2,000 records) the needed record using a keyID from the first result. I then take that record, if found, and assign three values to a new object I'm creating.

What I found when checking the performance (execution time) in VS2015 is that all these statements took about 330 ms total. Specifically the assignment of each value (e.g MatProfileCall.MatSizeText = mss.First().MatSizeText;) took around 80 ms each. I then added a .ToArray() to the two queries into vars. The execution went down to less than 40 ms for all the code.

So my question is, in a situation like this, does the assignment of a value from a Linq query that is based on another Linq query actually have to run both queries each time? And if that is true, then does it follow that anytime you are using a query result to assign values to another object there is always a point when putting the result into an array/list is a much better performance because Linq has to execute the query each time?

var subMatSize = (from d in pInEdiMatTransS where d.EdiSystemMaterialString == retString && d.IsMapToMatSizeID == true && d.IsSubstituteMatType == false select d);
if (subMatSize.Any())
{
   var mss = (from d in pmss where d.MatSizeID == subMatSize.First().DeafultMatSizeID  select d);
if(mss.Any())
{ 
    MatProfileCall.MatSizeText = mss.First().MatSizeText;
    MatProfileCall.MatSizeID = mss.First().MatSizeID;
    MatProfileCall.MatTypeID = mss.First().MatTypeID;
    return Result.Ok<EDIMaterialProFile>(MatProfileCall);
    }
     else
    { return Result.Fail<EDIMaterialProFile>(THC_Constants.sMICEDICONVERTERROR + "IsMapToMatSizeID is True but MatSizeID not found in MaterialSize");  }
}

Same code with .ToArray() producing a much faster code:

var subMatSize = (from d in pInEdiMatTransS where d.EdiSystemMaterialString == retString && d.IsMapToMatSizeID == true && d.IsSubstituteMatType == false select d).ToArray();
if (subMatSize.Any())
{
   var mss = (from d in pmss where d.MatSizeID == subMatSize.First().DeafultMatSizeID  select d).ToArray();
   if(mss.Any())
   { 
    MatProfileCall.MatSizeText = mss.First().MatSizeText;
    MatProfileCall.MatSizeID = mss.First().MatSizeID;
    MatProfileCall.MatTypeID = mss.First().MatTypeID;
    return Result.Ok<EDIMaterialProFile>(MatProfileCall);
   }
   else
   { return Result.Fail<EDIMaterialProFile>(THC_Constants.sMICEDICONVERTERROR + "IsMapToMatSizeID is True but MatSizeID not found in MaterialSize");  }
}

Please proof-read and format your question. The run-on sentences, the continuous block of text at the top, and the oddly formatted code make this question is hard to read. — Sam I am says Reinstate Monica, Mar 28 '17 at 19:41

D Stanley · Answer 1 · 2017-03-28T19:59:36.210

My guess is that you're re-using the initial queries in the first case, but just querying arrays in the second case, so you aren't reprocessing the queries again:

var subMatSize = (from d in pInEdiMatTransS 
                  where d.EdiSystemMaterialString == retString 
                     && d.IsMapToMatSizeID == true 
                     && d.IsSubstituteMatType == false 
                  select d);

Method 1:

var mss = (from d in pmss       V-- subMatSize is a query
           where d.MatSizeID == subMatSize.First().DeafultMatSizeID 
           select d);

in this case, the call to First() reruns the subMatSize query, wasting resources.

Method 2:

var mss = (from d in pmss       V-- subMatSize is an array
           where d.MatSizeID == subMatSize.First().DeafultMatSizeID 
           select d);

Here, you're just asking for the first item in an array, which is much faster. You do something similar for mss but are doing it three times. One for each First() call.

You could possibly get similar (or better) results by calling FirstOrDefault and checking for null versus calling Any:

var subMatSize = (from d in pInEdiMatTransS 
                  where d.EdiSystemMaterialString == retString 
                     && d.IsMapToMatSizeID == true 
                     && d.IsSubstituteMatType == false 
                  select d);
var firstSubMat = subMatSize.FirstOrDefault();

if (firstSubMat != null)
{
   var mss = (from d in pmss       V-- use variable here
              where d.MatSizeID == firstSubMat.DeafultMatSizeID
              select d);
   var firstmss = mss.FirstOrDefault();

   if(firstmss != null)
   { 
        MatProfileCall.MatSizeText = firstmss.MatSizeText;
        MatProfileCall.MatSizeID   = firstmss.MatSizeID;
        MatProfileCall.MatTypeID   = firstmss.MatTypeID;
        return Result.Ok<EDIMaterialProFile>(MatProfileCall);
    }

So my question is, in code like this, does the assign of a value from a Linq query that is based on another Linq query actually have to run both queries each time and therefore anytime you are using a query result to assign values to another object there is always a point when putting the result into an array/list is a much better performance because Linq has to execute the query each time?

Yes. In this case, though, you only care about the first result so you don't need to store the entire result set, only the first one.

OK thanks I had not thought this through but it does make sens now what is going on. I retest with original, the array, and your suggestion I got (yours fastest): Original: 340ms Array: 8 ms Yours 4 ms — patrickL, Mar 28 '17 at 21:50

When finding and using a single result in Iinq using ToArray() huge performance increase

1 Answers1