How to use a group by with wildcards in Postgres

Question

I'm having trouble with this code in Postgres. I'd like to get all fields in the user table, but only group by id. This works well on MySQL and SQLite, but after googling I've found out that this behaviour isn't part of the standard.

SELECT u.*, p.*, count(o.id) AS order_count
FROM shop_order AS o
LEFT JOIN user AS u ON o.user_id = u.id
LEFT JOIN userprofile AS p ON p.user_id = u.id
WHERE o.shop_id = <shop_id>
GROUP BY u.id

(Please note that <shop_id> is being passed programmatically, so you could just as easily replace it with any integer.)

This is the error I'm getting

column "u.id" must appear in the GROUP BY clause or be used in an aggregate function

I'm an SQL newbie, so be gentle if this is obvious :)

Does it works if you explicitly list all the fields instead of using `*`? — dschulz, Apr 16 '12 at 07:44
If this is PostgreSQL 9.1, I would expect it to work if you `GROUP BY u.id, p.id`, assuming that `id` is the primary key of each table. (I probably *shouldn't* assume that, because having a column called `id` as the primary key of every table is horrible practice IMO, but it seems to be done more often than not.) — kgrittn, Apr 16 '12 at 09:55

score 4 · Answer 1 · answered Apr 16 '12 at 07:44

Unfortunately I don't think you can do it without listing all the columns in user and userprofile. (Use of wildcards is often considered to be something you do for simple ad-hoc queries only, and I guess this is one of the reasons!) MySQL and other DBs have a "convenient" feature that if a column is neither in the GROUP BY clause nor in an aggregate function, then any of the values for that column in each group can be shown in the result. PostgreSQL is a bit stricter.

But normally, only a few columns of the table are sufficient for the grouping. There are lots of ways to rewrite it. Perhaps what you can do instead is:

SELECT u.*, p.*, (select count(o.id) FROM shop_order AS o WHERE o.user_id = u.id AND o.shop_id = <shop_id>) AS order_count
FROM user AS u
LEFT JOIN userprofile AS p ON p.user_id = u.id

score 2 · Answer 2 · answered Apr 16 '12 at 07:44

You could use row_number to limit the result to one row per user:

select  *
from    (
        select  row_number() over (partition by u.id) as rn
        ,       *
        join    user u
        on      sub.user_id = u.id
        join    shop_order so
        on      so.user_id = u.id
        left join
                userprofile up
        on      up.user_id = u.id
        where   so.shop_id = <shop_id>
        ) as SubQueryAlias
where   rn = 1

The subquery is required because Postgres does not allow window functions like row_number in the where clause.

score 2 · Answer 3 · answered Apr 16 '12 at 07:55

You are trying to ask the database to to count something quite un-specific You want to count the orders but then you want to display all of the details, what if the user profile contains more than one row per user and those rows are different, you will get more than one row for each u.id and you specified you only wanted one Thats why the database is saying " go away and tell me what you really want "

Michael Buen · Answer 4 · 2012-04-16T08:07:03.327

What's your Postgres version? If you are using Postgres 9.0 and below, your code construct won't run. e.g. this is not allowed:

SELECT
    p.id,
    p.firstname, p.lastname, p.address, -- these auxiliary fields is helpful to program users
    count(*)
FROM
    people p
    JOIN visits v ON p.id = v.person_id
GROUP BY p.id

Must do this on 9.0 and below:

SELECT
    p.id,
    p.firstname, p.lastname, p.address, -- these auxiliary fields is helpful to program users
    count(*)
FROM
    people p
    JOIN visits v ON p.id = v.person_id
GROUP BY p.id
    ,p.firstname, p.lastname, p.address; -- these ancillary fields aids the RDBMS on preventing
    -- programmers from accidentally committing the same mistake as MySQL programmers.

If you are using Postgres 9.1, you can certainly use only one field in GROUP BY even you have many fields in SELECT clause, as long as that GROUPed field(s) are primary keys. Hence this will run:

SELECT
    p.id,
    p.firstname, p.lastname, p.address, -- these auxiliary fields is helpful to program users
    count(*)
FROM
    people p
    JOIN visits v ON p.id = v.person_id
GROUP BY p.id 
-- note that there's no need to repeat the SELECTed fields on GROUP BY clause

Postgres 9.1 can facilitate functional dependencies on primary keys, hence including only the primary key(s) on GROUP BY clause is allowed

SQL Fiddle example: http://sqlfiddle.com/#!1/3b857/3

If you assume one user can have multiple orders per shop, there is no functional dependency in this case. The OP is relying on MySQL's specific behavior where it picks the value from the first row it encounters for non-grouped columns. — Andomar, Apr 16 '12 at 08:07
Quoting the question, "This works well on MySQL and SQLite, but after googling I've found out that this behaviour isn't part of the standard." — Andomar, Apr 16 '12 at 08:17

score 2 · Answer 5 · answered Apr 16 '12 at 08:10

2

Which version of the PostgreSQL you're using? I believe it is pre-9.1 one.

Starting from version 9.1 PostgreSQL is also supporting the feature you're looking for in case you do GROUP BY primary key columns. So maybe upgrade is a good option here.

answered Apr 16 '12 at 08:10

vyegorov

21,787
7
59
73

Assuming there are multiple orders per user per shop, the 9.1 feature you're referring to won't help – Andomar Apr 16 '12 at 08:18
It will help for `orders`, as he's doing aggregate on the only orders' columns. It will not help for `userprofiles` though, I do agree. The PK of the `userprofiles` needs to be included in the `GROUP BY` section. – vyegorov Apr 16 '12 at 08:29
Good point about `shop_order` only being used in the aggregate! – Andomar Apr 16 '12 at 08:37

How to use a group by with wildcards in Postgres

5 Answers5