2

I have this code with scheduler.PageResults that contains millions of row.

var  AllNonHTMLPages = scheduler.PageResults
                         .Where(p => (p.SkipReason & SkipReasonEnum.NoHTML) == SkipReasonEnum.NoHTML);
Console.WriteLine("# All Non HTML Pages: {0}", AllNonHTMLPages.Count());
foreach (PageData page in AllNonHTMLPages) { Console.WriteLine("Non HTML Page: {0}", page.Url); }

foreach (PageData page in scheduler.PageResults
        .Where(p => p.SkipReason.IsFlagSet(SkipReasonEnum.None))
        .OrderByDescending(p => p.IndexPath.Length))
{
   .....
}

Roslyn Contributing Code indicate

  • Avoid LINQ.
  • Avoid allocations in compiler hot paths:
  • Avoid using foreach over collections that do not have a struct enumerator. Consider using an object pool. There are many usages of object pools in the compiler to see an example.

I understand that LINQ is slow. Some ideas to optimize with no Linq API?

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
LeMoussel
  • 5,290
  • 12
  • 69
  • 122
  • 5
    "I understand that LINQ is slow" - the Roslyn team aren't suggesting no-one uses LINQ, just that it's not appropriate within Roslyn. Unfortunately we have very little idea what performance you're seeing, or where `scheduler.PageResults` comes from... you need to give us more information. – Jon Skeet Sep 09 '15 at 05:49

1 Answers1

0

I have found that when using Linq and looping over large sets of objects (>5000) when there's a lot of data/column in the rows, it's just slow. So, if you have access to the database and can do it there, you will get orders of magnitude faster performance. Also I never implemented it (wrote TSQL) but, you can buy/get package to build in memory indexes of your objects to speed things up.

patrickL
  • 23
  • 4