1

I have Table A with columns X,Y,Z.

X is an FK, Y is a description. Each X has exactly one corresponding Y. So if X stays the same over multiple records, Y stays the same too. So there may be any number of records where X and Y are the same.

Now I'm running the following query:

SELECT X, Y
FROM A
GROUP BY X;

Will this query work? Y is supposed to be grouped alongside X, but I didnt explicitely specify it in the query. Does MySQL still implicitely act this way though? And is this behavior reliable/standardized? Furthermore, will the results vary based on the datatype of Y. For example, is there a difference if Y is either VARCHAR, CHAR or INT? In case of an int, will the result be a SUM() of the grouped records?

Is the behavior MySQL will expose in such a case normed/standardized and where can I look it up?

GMB
  • 216,147
  • 25
  • 84
  • 135
Narktor
  • 977
  • 14
  • 34
  • reliable/standardized - I wouldn't like to rely on hope that all mysql implementations are the same and that mariadb is always in step with mysql (I mention mariadb since I know that some sites have a mix of mysql and mariadb). – P.Salmon Apr 02 '20 at 09:17

2 Answers2

3

Each X has exactly one corresponding Y

SELECT X, Y FROM A GROUP BY X;

Will this query work?

Technically, what happens when you run this query under MySQL depends on wether sql mode ONLY_FULL_GROUP_BY is enabled or not:

  • it it is enabled, the query errors: all non-aggregated columns must appear in the GROUP BY clause (you need to add Y to the GROUP BY clause)

  • else, the query executes, and gives you an arbitrary value of Y for each X; but since Y is functionnaly dependant on X, the value is actually predictable, so this is OK.

Generally, although the SQL standard does recognizes the notion of functionnaly-dependant column, it is a good practice to always include all non-aggregated colums in the GROUP BY clause. It is also a requirement in most databases other than MySQL (and, starting MySQL 5.7, ONLY_FULL_GROUP_BY is enabled by default). This also prevents you from various pitfalls and unpredictable behaviors.

Using ANY_VALUE() makes the query both valid and explicit about its purpose:

SELECT X, ANY_VALUE(Y) FROM A GROUP BY X;

Note that if you only want the distinct combinations of X, Y, it is simpler to use SELECT DISTINCT:

SELECT DISTINCT X, Y FROM A;
Community
  • 1
  • 1
GMB
  • 216,147
  • 25
  • 84
  • 135
  • Thanks a lot! :) Just to make sure I understand the term "non-aggregated column" correctly: In my query, Y is a non-aggregated column because I did not use a function for aggregation on it like SUM(Y) or COUNT(Y). Is that correct? Any Column in my SELECT statement to which I havent applied such functions are therefore "non-aggregated columns" in terms of the official docs? – Narktor Apr 02 '20 at 11:48
0

Your query will work if Y is functionally dependent on X (depending on SQL mode being used), but if you are trying to get distinct X,Y pairs from the table, it is better to use DISTINCT. The GROUP BY is meant to be used with the aggregate functions.

So you should use:

SELECT DISTINCT X, Y
FROM A;

A sample case where you would use GROUP BY would be with an aggregate functions:

SELECT DISTINCT X, Y, COUNT(*)
FROM A
GROUP BY  X, Y;
slaakso
  • 8,331
  • 2
  • 16
  • 27