Aggregate SQL Function to grab only the first from each group

Question

I have 2 tables - an Account table and a Users table. Each account can have multiple users. I have a scenario where I want to execute a single query/join against these two tables, but I want all the Account data (Account.*) and only the first set of user data (specifically their name).

Instead of doing a "min" or "max" on my aggregated group, I wanted to do a "first". But, apparently, there is no "First" aggregate function in TSQL.

Any suggestions on how to go about getting this query? Obviously, it is easy to get the cartesian product of Account x Users:

 SELECT User.Name, Account.* FROM Account, User
 WHERE Account.ID = User.Account_ID

But how might I got about only getting the first user from the product based on the order of their User.ID ?

SQL Server is worse off because it has no FIRST. I have not heard a convincing explanation for why it does not exist in SQL Server. Sometimes it doesn't matter what order they are in (if they all have the same value in a column for a particular group) and sometimes it does (and they are ordered). Either way FIRST() would have a use. — micahhoover, Oct 06 '11 at 17:16

score 27 · Accepted Answer · answered Apr 21 '09 at 16:19

27

Rather than grouping, go about it like this...

select
    *

from account a

join (
    select 
        account_id, 
        row_number() over (order by account_id, id) - 
            rank() over (order by account_id) as row_num from user
     ) first on first.account_id = a.id and first.row_num = 0

answered Apr 21 '09 at 16:19

Adam Robinson

182,639
35
285
343

interesting, I didn't realize you could do something like first.row_num = 0 – Matt Apr 21 '09 at 16:39
5

I see you used Rank() here, then subtracted it from Row_Number(), and looked for 0. I would have use ONLY Row_Number() (with it Partitioned by Account_ID) and filtered on Row_Num = 1 . The results would be the same (and maybe technically faster). See @AaronLS 's example: http://stackoverflow.com/a/9220232/555798 – MikeTeeVee Jan 02 '15 at 18:14
4

@MikeTeeVee Agreed; that is a better solution, and it's what I would have come up with were I solving that problem today. – Adam Robinson Jan 02 '15 at 18:22
I didn't understand the row_num = 0 / 1 part. I solved removing such part and using a where clause in the outer query – Marco Marsala Jun 05 '15 at 07:46

Dominic Goulet · Answer 2 · 2012-10-12T12:25:48.520

I know my answer is a bit late, but that might help others. There is a way to achieve a First() and Last() in SQL Server, and here it is :

Stuff(Min(Convert(Varchar, DATE_FIELD, 126) + Convert(Varchar, DESIRED_FIELD)), 1, 23, '')

Use Min() for First() and Max() for Last(). The DATE_FIELD should be the date that determines if it is the first or last record. The DESIRED_FIELD is the field you want the first or the last value. What it does is :

Add the date in ISO format at the start of the string (23 characters long)
Append the DESIRED_FIELD to that string
Get the MIN/MAX value for that field (since it start with the date, you will get the first or last record)
Stuff that concatened string to remove the first 23 characters (the date part)

Here you go!

EDIT: I got problems with the first formula : when the DATE_FIELD has .000 as milliseconds, SQL Server returns the date as string with NO milliseconds at all, thus removing the first 4 characters from the DESIRED_FIELD. I simply changed the format to "20" (without milliseconds) and it works all great. The only downside is if you have two fields that were created at the same seconds, the sort can possibly be messy... in which cas you can revert to "126" for the format.

Stuff(Max(Convert(Varchar, DATE_FIELD, 20) + Convert(Varchar, DESIRED_FIELD)), 1, 19, '')

EDIT 2 : My original intent was to return the last (or first) NON NULL row. I got asked how to return the last or first row, wether it be null or not. Simply add a ISNULL to the DESIRED_FIELD. When you concatenate two strings with a + operator, when one of them is NULL, the result is NULL. So use the following :

Stuff(Max(Convert(Varchar, DATE_FIELD, 20) + IsNull(Convert(Varchar, DESIRED_FIELD), '')), 1, 19, '')

I noticed a significant performance hit using min() instead of a nested select with a top 1. I think the reason is that min iterates the entire data set, where top 1 just takes the first one it comes across. — Des Horsley, Jul 14 '16 at 00:46

score 11 · Answer 3 · answered Feb 09 '12 at 22:56

Select *
From Accounts a
Left Join (
    Select u.*, 
    row_number() over (Partition By u.AccountKey Order By u.UserKey) as Ranking
    From Users u
  ) as UsersRanked
  on UsersRanked.AccountKey = a.AccountKey and UsersRanked.Ranking = 1

This can be simplified by using the Partition By clause. In the above, if an account has three users, then the subquery numbers them 1,2, and 3, and for a different AccountKey, it will reset the numnbering. This means for each unique AccountKey, there will always be a 1, and potentially 2,3,4, etc.

Thus you filter on Ranking=1 to grab the first from each group.

This will give you one row per account, and if there is at least one user for that account, then it will give you the user with the lowest key(because I use a left join, you will always get an account listing even if no user exists). Replace Order By u.UserKey with another field if you prefer that the first user be chosen alphabetically or some other criteria.

score 6 · Answer 4 · answered Oct 29 '15 at 12:48

6

I've benchmarked all the methods, the simpelest and fastest method to achieve this is by using outer/cross apply

SELECT u.Name, Account.* FROM Account
OUTER APPLY (SELECT TOP 1 * FROM User WHERE Account.ID = Account_ID ) as u

CROSS APPLY works just like INNER JOIN and fetches the rows where both tables are related, while OUTER APPLY works like LEFT OUTER JOIN and fetches all rows from the left table (Account here)

answered Oct 29 '15 at 12:48

fire in the hole

1,171
15
34

2

This query can give inconsistent result. SELECT TOP 1 without SORT BY can return any of match of query, it depends on SqlServer Engine. And Thus such result can give "random results". – Tomas Kubes Dec 03 '15 at 07:58
Good point but easy to add a sort by. – Craig.C Feb 02 '23 at 10:46

score 4 · Answer 5 · answered Sep 28 '15 at 21:23

4

You can use OUTER APPLY, see documentation.

SELECT User1.Name, Account.* FROM Account
OUTER APPLY 
    (SELECT  TOP 1 Name 
    FROM [User]
    WHERE Account.ID = [User].Account_ID
    ORDER BY Name ASC) User1

answered Sep 28 '15 at 21:23

Tomas Kubes

23,880
18
111
148

score 3 · Answer 6 · answered Jul 05 '12 at 14:05

3

The STUFF response from Dominic Goulet is slick. But, if your DATE_FIELD is SMALLDATETIME (instead of DATETIME), then the ISO 8601 length will be 19 instead of 23 (because SMALLDATETIME has no milliseconds) - so adjust the STUFF parameter accordingly or the return value from the STUFF function will be incorrect (missing the first four characters).

answered Jul 05 '12 at 14:05

mweaver

208
2
4

Thanks for the comment! I noticed that too some weeks ago, updated my answer. This also happens when your datetime has .000 as milliseconds, they are just stripped and you loose the first 4 characters. I changed the format from 126 to 20 to always cut the milliseconds, it's now working great! – Dominic Goulet Jul 06 '12 at 11:33

score 3 · Answer 7 · answered Apr 21 '09 at 16:19

3

SELECT (SELECT TOP 1 Name 
        FROM User 
        WHERE Account_ID = a.AccountID 
        ORDER BY UserID) [Name],
       a.*
FROM Account a

answered Apr 21 '09 at 16:19

Jimmie R. Houts

7,728
2
31
38

However, this approach will execute another select statement for every account row. If you have 1000 accounts, your query will execute 1001 independent select statements) – Adam Robinson Apr 21 '09 at 16:21
Not a big deal for small tables, but your solution is better :) – Jimmie R. Houts Apr 21 '09 at 16:35

score 2 · Answer 8 · answered Dec 02 '11 at 10:05

First and Last do not exist in Sql Server 2005 or 2008, but in Sql Server 2012 there is a First_Value, Last_Value function. I tried to implement the aggregate First and Last for Sql Server 2005 and came to the obstacle that sql server does guarantee the calculation of the aggregate in a defined order. (See attribute SqlUserDefinedAggregateAttribute.IsInvariantToOrder Property, which is not implemented.) This might be because the query analyser tries to execute the calculation of the aggregate on multiple threads and combine the results, which speeds up the execution, but does not guarantee an order in which elements are aggregated.

Welcome to Stack Overflow! Be careful when posting copy and paste boilerplate/verbatim answers to multiple questions, these tend to be flagged as "spammy" by the community. If you're doing this then it usually means the questions are duplicates so flag them as such instead. — Kev, Dec 02 '11 at 11:08

score 1 · Answer 9 · answered Apr 21 '09 at 16:20

Define "First". What you think of as first is a coincidence that normally has to do with clustered index order but should not be relied on (you can contrive examples that break it).

You are right not to use MAX() or MIN(). While tempting, consider the scenario where you the first name and last name are in separate fields. You might get names from different records.

Since it sounds like all your really care is that you get exactly one arbitrary record for each group, what you can do is just MIN or MAX an ID field for that record, and then join the table into the query on that ID.

He said first based upon their user id – Adam Robinson Apr 21 '09 at 16:21 — Adam Robinson, Apr 21 '09 at 16:21

score 0 · Answer 10 · answered Jan 09 '17 at 22:38

(Slightly Off-Topic, but) I often run aggregate queries to list exception summaries, and then I want to know WHY a customer is in the results, so use MIN and MAX to give 2 semi-random samples that I can look at in details e.g.

SELECT Customer.Id, COUNT(*) AS ProblemCount
      , MIN(Invoice.Id) AS MinInv, MAX(Invoice.Id) AS MaxInv
FROM Customer
INNER JOIN Invoice on Invoice.CustomerId = Customer.Id
WHERE Invoice.SomethingHasGoneWrong=1
GROUP BY Customer.Id

score 0 · Answer 11 · answered Apr 21 '17 at 14:37

Create and join with a subselect 'FirstUser' that returns the first user for each account

SELECT User.Name, Account.* 
FROM Account, User, 
 (select min(user.id) id,account_id from User group by user.account_id) as firstUser
WHERE Account.ID = User.Account_ID 
 and User.id = firstUser.id and Account.ID = firstUser.account_id

score 0 · Answer 12 · answered Apr 21 '09 at 16:19

0

There are a number of ways of doing this, here a a quick and dirty one.

Select (SELECT TOP 1 U.Name FROM Users U WHERE U.Account_ID = A.ID) AS "Name,
    A.*
FROM Account A

answered Apr 21 '09 at 16:19

Mitchel Sellers

62,228
14
110
173

Aggregate SQL Function to grab only the first from each group

12 Answers12

Linked

Related