0

Is it true that random data inside a lazy statement might get evaluated differently at runtime? With the following code, I see "wow" printed to the console many times. However, if I force the result of the query (i.e. call ToList() on xs and ys), things seem to work fine.

    public static void Main(string[] args)
    {
        var generator = new Random();
        var xs = from x in Enumerable.Range(0, 20000)
                 select generator.Next();

        var ys = from y in Enumerable.Range(0, 5000)
                 select generator.Next();

        foreach (var x in xs)
        {
            var q1 = from y in ys where y > x select y;
            var q2 = from y in ys where y > x select y;

            if (!q1.SequenceEqual(q2))
                Console.WriteLine("wow!");
        }

        Console.WriteLine("done");
        Console.ReadLine();

    }

I suspect that this has to do with the fact that linq queries are "lazy". Is this accurate?

rookie
  • 2,783
  • 5
  • 29
  • 43
  • the "deferred execution" of LINQ queries is one of the most common misunderstood features. A query is always executed when the query variable is iterated over, not when the query variable is created. – Antonio Pelleriti May 21 '15 at 13:37
  • I really want to come up with an answer that "fixes" this by using a `new Random` for each iteration of `ys`, but I can't figure out how to do it without making it obvious :( – Rawling May 21 '15 at 13:40
  • @Rawling `from x in Enumerable.Range(0, 20000) let rnd = new Random() select rnd.Next(); ` – xanatos May 21 '15 at 13:41
  • @xanatos Nah, that doesn't seem to do it - you get a `new Random` for each `y` rather than each `ys`, and I think that slows it down enough that it breaks. – Rawling May 21 '15 at 13:42
  • @Rawling You are right: `(from useless in new int[1] let rnd = new Random() from x in Enumerable.Range(0, 20000) select rnd.Next()).ToArray();` – xanatos May 21 '15 at 13:46
  • @xanatos Nice - that gives similar results to my solution, which was to put it in an iterator method. – Rawling May 21 '15 at 13:48

1 Answers1

2

Is it true that random data inside a lazy statement might get evaluated differently at runtime?

What is true is that, as you have written:

I suspect that this has to do with the fact that linq queries are "lazy". Is this accurate?

An important additional thing is that they aren't "materialized"/"cached" after execution, so every time you execute them, they are regenerated.

This line

if (!q1.SequenceEqual(q2))
    Console.WriteLine("wow!");

will cause the evaluation of the q1 and q2 enumerables, that each will cause the enumeration of ys. So ys will be "generated" twice for each cycle of the foreach cycle.

So, considering that the foreach cycle will do 20000 "cyles", ys will be "generated" 40000 times.

The Random.Next() will be executed 20000 + (20000 * 2 * 5000) times, where

20000: the xs sequence, used only once by the foreach cycle:
20000 * 2 * 5000: 20000 cycles in which the ys sequence is used twice

Note what would have happened if:

var xs = (from x in Enumerable.Range(0, 20000)
         select generator.Next()).ToArray();

var ys = (from y in Enumerable.Range(0, 5000)
         select generator.Next()).ToArray();

Here we are "materializing" the enumerables in an array (in two arrays to be precise). The Random.Next() will be called a grandtotal of 20000 + 5000 times, and it will be done directly in these two lines. The enumeration of xs and ys won't cause the generation of new random numbers.

xanatos
  • 109,618
  • 12
  • 197
  • 280