6

I am using a table that looks like this:

userID, eventDate, eventName
1  2019-01-01  buySoup
2  2019-01-01  buyEggs
2  2019-01-03  buyMilk
2  2019-01-04  buyMilk
3  2019-01-02  buyBread
3  2019-01-03  buyBread

My current query is:

SELECT
    userID,
    FIRST_VALUE(eventName) OVER (
        PARTITION BY userID ORDER BY eventDate ASC
    ) AS firstBought 
FROM table 
GROUP BY userID

I feel like this should return:

userID, firstBought
1  buySoup
2  buyEggs
3  buyBread

Instead, it gives the error:

'ERROR: Column "table.eventName" must appear in the GROUP BY clause or be used in an aggregate function'

Is there a way to grab this value without including it in the group by function, or creating a sub query? I'm using PostgreSQL.

If I do include it in the group by clause, it returns

userID, firstBought
1  buySoup
2  buyEggs
2  buyEggs
2  buyEggs
3  buyBread
3  buyBread

I understand that I could make it a subquery and then group by userID, firstBought, but I'd rather not create another subquery.

bignose
  • 30,281
  • 14
  • 77
  • 110
Cyborgcanoe
  • 61
  • 1
  • 4

4 Answers4

8

Instead of group by, use select distinct:

select distinct userID,
       FIRST_VALUE(eventName) over (partition by userID order by eventDate ASC) as firstBought 
from table ;

Or, you can use arrays:

select userId,
       (array_agg(eventName order by eventDate))[1] as firstBought
from table
group by userId;

Postgres doesn't have a "first" aggregation function, but this works pretty well.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Although that's useful, and works exactly as intended, my table actually also includes more columns, and my query includes max() and other aggregate functions on those columns. This means I do need the group by at the end. Is there any other solution? – Cyborgcanoe Feb 25 '19 at 21:28
  • @Cyborgcanoe . . . This answer provides two solutions. The second one uses `group by`. – Gordon Linoff Feb 25 '19 at 21:30
  • thank you, the second answer is great. Unfortunately, I'm actualy using Vertica Analytic Database v9.1.1-5, a branch of postgres which doesn't include array_agg. Thanks anyways! – Cyborgcanoe Feb 25 '19 at 22:23
  • @Cyborgcanoe . . . You should correctly tag your questions. – Gordon Linoff Feb 26 '19 at 01:46
3

I guess PostgreSQL's DISTINCT ON could do the trick:

SELECT DISTINCT ON (userid)
       userid, eventdate, eventname
FROM "table"
ORDER BY (eventdate);

This will give you the row per userid with the minimum eventdate.

Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263
2

FIRST_VALUE is not an aggregate function. It is an analytic window function. So your base query does not need a GROUP BY clause. It should be re-written as:

SELECT 
        userID,
        FIRST_VALUE(eventName) over (PARTITION BY userID ORDER BY eventDate ASC) AS firstBought
FROM table;

From one of your above comments it sounds like there are other functions that you are using including aggregate functions like MAX. To accomplish what you are trying to do, you will need to use the above query as a subquery. This will allow you to use aggregate functions and get unique values from your base query. The query can look something like this (I added a price column as an example).

SELECT userID, firstBought, MAX(price)
FROM (
        SELECT userID, price, FIRST_VALUE(eventName) over (partition by userID order by eventDate ASC) as firstBought 
        from test
) x
GROUP BY userId, firstBought;

This should do the trick! You can use other aggregate functions on the outside query and an additional window functions in the subquery.

A. Saunders
  • 815
  • 1
  • 6
  • 19
1

I agree with A. Saunders.

You need an outside query.

With the exception of SELECT DISTINCT , which actually boils down to a GROUP BY all columns of the SELECT list, you can't mix OLAP and GROUP BY functions into the same SELECT.

So , if you do have MAX(), you have to:

WITH -- your input data ...
input(userID,eventDate,eventName) AS (
          SELECT 1,DATE '2019-01-01','buySoup'
UNION ALL SELECT 2,DATE '2019-01-01','buyEggs'
UNION ALL SELECT 2,DATE '2019-01-03','buyMilk'
UNION ALL SELECT 2,DATE '2019-01-04','buyMilk'
UNION ALL SELECT 3,DATE '2019-01-02','buyBread'
UNION ALL SELECT 3,DATE '2019-01-03','buyBread'
)
,
getfirstbought AS (
  SELECT 
    userid
  , eventdate
  , FIRST_VALUE(eventname) OVER (
      PARTITION BY userid ORDER BY eventdate
   ) AS firstbought
  FROM input
)
SELECT
  userid
, firstbought
, MAX(eventdate) AS maxdt
FROM getfirstbought
GROUP BY 1,2;
-- out  userid | firstbought |   maxdt    
-- out --------+-------------+------------
-- out       2 | buyEggs     | 2019-01-04
-- out       3 | buyBread    | 2019-01-03
-- out       1 | buySoup     | 2019-01-01
-- out (3 rows)
-- out 
-- out Time: First fetch (3 rows): 22.157 ms. All rows formatted: 22.208 ms
marcothesane
  • 6,192
  • 1
  • 11
  • 21