0

I'm joining data from 2 tables to get the last transaction date of a customer in my database. This is a simplified example of what I'm trying to do:

SELECT c.CustomerID
       c.CustomerFirstName
       c.CustomerLastName
   MAX(t.TransactionDate)

FROM [db].[customertable] C

LEFT JOIN [TransactionTable] T on c.CustomerID = t.CustomerID

GROUP BY c.CustomerID

ORDER BY t.TransactionDate

However when executing this code I get this error message:

'Column is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause'

I've spent some time trying to trouble shoot this and found that putting MAX() in front of the customer first name and last name works, gets rid of the error message and the output seems to be what I want.

Though I'm wondering if this is bad practice? I'm grouping by CustomerID and this is a unique reference - there will only be one first name and one last name belonging to each CustomerID so therefore am I right in saying putting the max in front of the first name / last name wont have any impact other than to make my query work?

Other solutions I found online suggested sub queries but I'm still learning SQL so not sure how to do that.

Thanks

  • 1
    The error is quite explicit here; what part of the error don't you understand here so that we can try to elaborate. – Thom A Dec 27 '19 at 12:35
  • 1
    T-SQL require all non-aggregated columns to be enclosed with agg function: https://stackoverflow.com/questions/33629168/group-by-clause-in-mysql-and-postgresql-why-the-error-in-postgresql/33629201#33629201 – Lukasz Szozda Dec 27 '19 at 12:35
  • Related: T-SQL does not suppport [T301 aka Functional dependencies](https://stackoverflow.com/a/54010993/5070879) – Lukasz Szozda Dec 27 '19 at 12:37
  • Use c.CustomerFirstName, c.CustomerLastName in group by – Kiran Patil Dec 27 '19 at 12:52

1 Answers1

1

Just include all the non-aggregated columns:

SELECT c.CustomerID, c.CustomerFirstName, c.CustomerLastName,
       MAX(t.TransactionDate)
FROM [db].[customertable] C LEFT JOIN
     [TransactionTable] T
     ON c.CustomerID = t.CustomerID
GROUP BY c.CustomerID, c.CustomerFirstName, c.CustomerLastName;

Often, this type of query would be faster using a correlated subquery or lateral join (i.e. apply):

select c.*,  -- or whatever columns you want
       t.max_TransactionDate
from [db].[customertable] C outer apply
     (select max(t.TransactionDate) as max_TransactionDate
      from [TransactionTable] t
      where c.CustomerID = t.CustomerID
     ) t;

In particular, with an index on TransactionTable(CustomerID, TransactionDate desc), this should be noticeably faster than the version with the group by.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786