Disambiguating columns used in group by that come from select

Question

Assume a table person exists with columns id, first_name, last_name and table 'dog' exists with columns id, owner_id, name exist.

Take the following query:

Select
  first_name + last_name as name,
  count(*) dog_count
from
  person
  join dogs on person.id = dog.owner_id
group by
  person.id,
  first_name + last_name;

This query works and for this purpose, but has 1 thing that erks me whenever I have to write group by clauses and I went searching for solutions today and didn't really ever find one that I liked.

name from the query select clause doesn't exist when it processes the group by (due to the order of execution). Many RDMS allow it anyways as an extension, but even most (I think all) of those will run afoul of the following problem: if I use name instead of first_name + last_name in the group by clause, instead of going to the select clause to use name from there it'll to go name from the dog table.

I don't know if it's part of the standard, but many RDMS allow you to use column numbers instead to specify columns from the select, so I can do group by 1, but this becomes problematic with larger queries that may be dynamically specifying columns so that the ordering may change.

So... the final question: does any RDMS that anybody is aware of have a way to disambiguate by column name (or other consistent attribute) when pulling columns from the select clause? An example of something that could work would be group by select.name to specify 'take the column name from the select clause and apply it to the group by clause`?

Does anybody know if discussion on the topic has ever been brought as something to make it into SQL standards?

My apologies, I never did come back to this. It's a violation of DRY. I want to be able to make changes to one part of a piece of code and have it properly propogate through the rest of the code. If I wanted to change name to also have a middle initial, I'd have to change it in multiple locations. — Darinth, Mar 03 '22 at 22:38
Well, if you want to be that granular with DRY, I suppose it’s a violation. When I think of DRY, I think of not writing the same query twice, not whether or not one change require updates to two different lines within the same query. In that regard, I’d have to disagree. I did, however, provide you a way around this in my answer. — Brendan McCaffrey, Mar 03 '22 at 22:48

Brendan McCaffrey · Answer 1 · 2022-02-25T22:54:17.903

You're right that GROUP BY happens before SELECT. Therefore, GROUP BY does not yet know that first_name + last_name is aliased as name in the SELECT, rather than expecting that name would be referring to dog.name.

If it really bothers you, you can write your group by as this.

group by
    person.id,
    person.first_name,
    person.last_name;

However, in general, regarding ambiguity, you can always put the table name before the column name to specify which table you're referring to. Assuming that column name actually exists in the base table. That still would not work there, though, as mentioned. However, it would work in an ORDER BY clause, since that happens after the SELECT.

Disambiguating columns used in group by that come from select

1 Answers1