2

I raised a related question about:
'Invalid column name [ColumnName]' on a nested linq query

I'm trying to get the top row of a group as asked here:
Linq - Top value from each group

However, I approached it different to the accepted answer using Max. I did it this way:

ATable
    .GroupBy (t => t.ID)
    .Select ( t => t.OrderByDescending(x=>x.Timestamp).FirstOrDefault().Timestamp)

This results in the error SqlException: Invalid column name 'ID'

Now when I use the Max function:

ATable
    .GroupBy (t => t.ID)
    .Select ( t => t.Max(x=>x.Timestamp))

It returns the list of times that I wanted.

My understanding is that they are different but equivalent calls. Why am I getting the sql error on the first call?

Update
The sql query generated for the first query looks like this:

SELECT (
    SELECT [t3].[Timestamp]
    FROM (
        SELECT TOP 1 [t2].[Timestamp]
        FROM [ATable] AS [t2]
        WHERE (([t1].[ID] IS NULL) AND ([t2].[ID] IS NULL)) OR (([t1].[ID] IS NOT NULL) AND ([t2].[ID] IS NOT NULL) AND ([t1].[ID] = [t2].[ID]))
        ORDER BY [t2].[Timestamp] DESC
        ) AS [t3]
    ) AS [value]
FROM (
    SELECT [t0].[ID]
    FROM [ATable] AS [t0]
    GROUP BY [t0].[ID]
    ) AS [t1]

I managed to reduce it down but keeping the same error:

SELECT (
    SELECT [t3].[Timestamp]
    FROM (
        SELECT TOP 1 [Timestamp]
        FROM [ATable]
        WHERE [t1].[ID] = [ID]
        ) AS [t3]
    ) AS [value]
FROM (
    SELECT [ID]
    FROM [ATable]
    GROUP BY [ID]
    ) AS [t1]

Update2
Some answers have said that these queries are logically different, however, if you do something like this on the northwind table, it will produce the same result:

OrderDetails
    .GroupBy(x=>x.ProductID)
    .Select(x=>x.OrderByDescending(y=>y.UnitPrice).FirstOrDefault().UnitPrice).Dump();

OrderDetails
    .GroupBy(x=>x.ProductID)
    .Select(x=>x.Max(y=>y.UnitPrice)).Dump();

So I cannot accept any of the 3 answers that say the queries are different or that Max is giving a Sum instead of the max value of a list or that you have to convert the IQueriable to a list before proceeding.

Community
  • 1
  • 1
Joe
  • 11,147
  • 7
  • 49
  • 60
  • Can you somehow get me the generated SQL query for the first query? serverlogs should have them. – Femaref Mar 09 '11 at 02:17
  • As a response to the answers provided, if you follow the second link, there is a solution offered (not accepted) that uses OrderByDescending but does not chain the .Timestamp or in general .[FieldID]. That gives back the first row.. but chaining the field doens't work (even with the FirstOrDefault). – Joe Mar 09 '11 at 02:18
  • If it works without chained `Timestamp` there must be a major difference between both queries. – Femaref Mar 09 '11 at 02:20

1 Answers1

4

Update 2: I just want to respond to this:

however, if you do something like this on the northwind table, it will produce the same result [...] So I cannot accept any of the 3 answers that say the queries are different

I did not mean to dispute that they should produce the same result. To illustrate my point better, let me provide an analogy.

Let's say I give you a shuffled pile of index cards numbered from 1 to 50 and I say, "Put these all in order." So, you go through and sort them out, making sure the pile goes from 1 to 50. This takes you a minute or two. Then I say, "Now give me the card on the top of the pile." This card would happen to be the card numbered 50.

On the other hand, suppose in the first place all I said was, "Give me the card with the highest number from this pile." Then all you would do is go through the cards and find the one with the highest number, without needing to do any sorting whatsoever. (I mean, you could sort the pile first, but that would clearly be doing more than I asked.)

So what I'm saying is that the results of the above two sets of instructions would be the same; however, they are clearly different instructions.

Now, it would not surprise me at all if certain implementations were intelligent enough to translate these into identical (optimized) SQL queries; all I'm saying is that, since they are different instructions, it should not be so surprising if the resulting SQL is different (and though it may be unfortunate, it's therefore also not too surprising that one version could cause an error when the other doesn't).


Update: Again, I can't really give you a detailed response in terms of Linq-to-SQL; but I will point out that while Max should in theory produce the same result as OrderByDescending together with FirstOrDefault, it is not really the same from a conceptual standpoint.

Consider Linq-to-objects. Calling Max would iterate over the sequence and accumulate a maximum value. Calling OrderByDescending would sort the entire sequence and then take the top.

Obviously, Linq-to-SQL is another animal; however, it makes sense that since these queries represent conceptually different approaches to obtain a certain result, the SQL they produce could very well be (and, as you have seen, is) different.


Original Answer

I can't really give you a detailed response in terms of Linq-to-SQL, but I will point out that there is a very serious difference between your two alternatives (they are not equivalent):

Max(x => x.Timestamp)

The above should return the greatest timestamp.

OrderByDescending(x => x.Timestamp()).FirstOrDefault()

The above should return the record with the greatest timestamp.

These are very different beasts.

Dan Tao
  • 125,917
  • 54
  • 300
  • 447
  • in essence, both are the same though, he could simply chain `.Timestamp` after the `FirstOrDefault` call, the problem should be in translating the expression tree into an sql statement. – Femaref Mar 09 '11 at 02:11
  • my bad, i am actually chaining `.Timestamp` to get the error, merely getting the FirstOrDefault() does not give me the error. updating question now. – Joe Mar 09 '11 at 02:14
  • @Femaref: he *could* do that (and maybe he did); I just figured I should point this out in case the OP was under a wrong impression about what `Max` does (I know I've made the mistake of thinking it'll return the record *with* the max more than once before). – Dan Tao Mar 09 '11 at 02:15
  • @Joe: Thanks for clearing that up. I have one additional observation to make; then I'll be on my way... – Dan Tao Mar 09 '11 at 02:15
  • 1
    yup, I fall for that sometimes as well. Need `MaxBy` operator. – Femaref Mar 09 '11 at 02:16
  • @Dan what I understand you saying is that MAX is a sum, but from my query and from msdn [Returns the maximum value in a sequence of values.](http://msdn.microsoft.com/en-us/library/system.linq.enumerable.max.aspx) this seems to contradict what you're saying. I'm confused. – Joe Mar 09 '11 at 02:39
  • @Joe: "Accumulate" can mean any folding operation, of which addition is probably the most common but definitely not the only one. In this context, it definitely does not mean summation. – Ben Voigt Mar 09 '11 at 03:17
  • @Ben ahh ok, i see the diff now. reading about folding in wiki helped me understand why accumulation != sum. i thought they were the same. – Joe Mar 09 '11 at 03:24
  • @Joe: Sorry, I did use the word "accumulate" in perhaps a fairly loose way. I simply meant that it involves a calculation that takes into account every element in a sequence. As @Ben pointed out, it does not have to mean sum. – Dan Tao Mar 09 '11 at 04:02
  • @Dan thanks for clearing it up. In fact, i think i've proved your last statement about the implementation correct. i've just tried to run the same query, same dataset on sql 2000 and sql 2005 and the 2000 came up with the Column Name error whilst 2005 did not. I'll chase the creator up about it in the other question. thanks for your thorough explaination. – Joe Mar 09 '11 at 05:05