-3

If I need just the maximum [or the 3 biggest items] of an array, and I do it with myArray.OrderBy(...).First() [or myArray.OrderBy(...).Take(3)], it is 20 times slower than calling myArray.Max(). Is there a way to write a faster linq query? This is my sample:

using System;
using System.Linq;

namespace ConsoleApp1
{
    class Program
    {
        static void Main(string[] args)
        {
            var array = new int[1000000];
            for (int i = 0; i < array.Length; i++)
            {
                array[i] = i;
            }

            var maxResults = new int[10];
            var linqResults = new int[10];
            var start = DateTime.Now;

            for (int i = 0; i < maxResults.Length; i++)
            {
                maxResults[i] = array.Max();
            }
            var maxEnd = DateTime.Now;

            for (int i = 0; i < maxResults.Length; i++)
            {
                linqResults[i] = array.OrderByDescending(it => it).First();
            }
            var linqEnd = DateTime.Now;

            // 00:00:00.0748281
            // 00:00:01.5321276
            Console.WriteLine(maxEnd - start);
            Console.WriteLine(linqEnd - maxEnd);
            Console.ReadKey();
        }
    }
}
Jeno Csupor
  • 2,869
  • 6
  • 30
  • 35
  • 1
    `var linqResults = array.OrderByDescending(it => it).Take(10).ToArray();` – Dmitry Bychenko Dec 07 '19 at 19:24
  • Getting the max is O(n), while sorting is probably O(n log(n)). Ergo, don't sort to get the max. – Julian Dec 07 '19 at 19:26
  • 4
    Why do you think you are not using Linq when it is array.Max(). You are using Linq in both, just doing one of them in a much longer way. – Cetin Basoz Dec 07 '19 at 19:29
  • 1
    Please read the Speed Rant: https://ericlippert.com/2012/12/17/performance-rant/ – Christopher Dec 07 '19 at 19:41
  • Linq-to-Sql is I suppose optimized to don't sort the entire table and then take just 1 item from it. That's why I thought that Linq-to-objects is smart enough to don't sort the entire array. – Jeno Csupor Dec 07 '19 at 19:43
  • Your title is misleading as it's not just LINQ that is slower, you've picked a completely different thing to do. The act of just iterating through the collection once, collecting the highest value along the way, is orders of magnitude faster than having to sort the entire collection and then grab one value from one end. – Lasse V. Karlsen Dec 07 '19 at 19:53
  • In a database, if your regular data access pattern included getting the three biggest items, then I'd hope you'd have an index on that column. At that point, the optimizer could figure out a fast path to the solution. There are a lot of folks who think that a LINQ solution is always the best one. As several people has pointed out, getting the Max(N) from a collection is an O(N) operation in a very simple loop. – Flydog57 Dec 07 '19 at 20:27

3 Answers3

1

You sort the initial array 10 times in a loop:

    for (int i = 0; i < maxResults.Length; i++)
    {
        linqResults[i] = array.OrderByDescending(it => it).First();
    }

Let's do it once:

    // 10 top item of the array
    var linqResults = array
      .OrderByDescending(it => it)
      .Take(10)
      .ToArray(); 

please, note, that

    for (int i = 0; i < maxResults.Length; i++)
    {
         maxResults[i] = array.Max();
    }

just repeat the same Max value 10 times (it doesn't return 10 top items)

Dmitry Bychenko
  • 180,369
  • 20
  • 160
  • 215
  • 1
    Still, getting the max 10 can be done in O(n). Guess the challenge is the linq requirement. – Julian Dec 07 '19 at 19:32
  • I've done it 10 times just to measure the time more accurately. I want to do it just once. – Jeno Csupor Dec 07 '19 at 19:40
  • Technically, we can do it in O(N), e.g. we can use `Min Heap`: we loop over `array`, while adding item to the heap; when size of it reaches `11` we take (and throw away) the top (min) item from the heap. – Dmitry Bychenko Dec 07 '19 at 19:41
  • There is no built in MinHeap (or PriorityQueue) either in .Net. – Jeno Csupor Dec 09 '19 at 09:55
1

Max method time consumption is O(n) and Ordering in the best time is O(n log(n)) First error of your code is that you are ordering 10 times which is worst scenario. You can order one time and take 10 of them like what Dmitry answered. And also, calling Max method for 10 times does not give you 10 biggest values, just the biggest value for 10 times.

However Max method does iterating list once and keep the Max value in a seperate variable. You can rewrite this method to iterate you array and keep you 10 biggest values in your maxResults and this is the fastest way that you can get result.

  • Its impressive how many people are talking besides the subject. I called it 10 times just to measure the time more accurately. What I was looking for, but in linq, is in my answer (morelinq). – Jeno Csupor Dec 09 '19 at 09:07
  • So, your answer is time consumption of methods Max and OrderBy. Max algorithm time is O(n) and Ordering time is O( n log n). There is no way to use OrderBy or OrderByDescending in a faster way of Max method. – Danial Kalhori Dec 09 '19 at 13:25
0

It seems that others have filled the efficiency gap that Microsoft has left in linq-to-objects: https://morelinq.github.io/3.1/ref/api/html/M_MoreLinq_MoreEnumerable_PartialSort__1_3.htm

Jeno Csupor
  • 2,869
  • 6
  • 30
  • 35