0

I find that sometimes, my results are incorrectly sorted with the following SQL (MySQL) query:

SELECT u.id, u.firstName, u.lastName, u.email, u.ssoProfile, u.vip, SUM(p.number) AS totalPoints 
FROM (
      SELECT u.id FROM user u 
      ORDER BY u.firstName ASC, u.lastName ASC, u.email ASC 
      LIMIT 100, 10
) u2 
INNER JOIN `user` u 
ON u2.id = u.id 
LEFT JOIN `point` p 
ON u.id = p.user 
AND p.expiryDate > NOW() 
GROUP BY u.id

As you can see I sorted by users in the subquery. However, I noticed sometimes, very rarely, I get results that do not follow the specified order. I think I need to order by on the outer query? But I want to understand whats wrong with this query. Why are my users' order messed up?

UPDATE

I've added an order by to the outer query. And surprisingly, I find that my results can still be returned in the wrong order

SELECT u.id, u.firstName, u.lastName, u.email, u.ssoProfile, u.vip, SUM(p.number) AS totalPoints 
FROM (
      SELECT u.id FROM user u 
      ORDER BY u.firstName ASC, u.lastName ASC, u.email ASC 
      LIMIT 100, 10
) u2 
INNER JOIN `user` u 
ON u2.id = u.id 
LEFT JOIN `point` p 
ON u.id = p.user 
AND p.expiryDate > NOW() 
GROUP BY u.id
ORDER BY u.firstName ASC, u.lastName ASC, u.email ASC

The results come back like:

[ { id: 4834, firstName: 'F01', lastName: 'L01' },
  { id: 4835, firstName: 'F00', lastName: 'L00' }, // << notice F00 is after F01?
  { id: 4836, firstName: 'F02', lastName: 'L02' },
  { id: 4837, firstName: 'F03', lastName: 'L03' },
  { id: 4838, firstName: 'F04', lastName: 'L04' },
  { id: 4839, firstName: 'F05', lastName: 'L05' },
  { id: 4840, firstName: 'F06', lastName: 'L06' },
  { id: 4841, firstName: 'F07', lastName: 'L07' },
  { id: 4842, firstName: 'F08', lastName: 'L08' },
  { id: 4843, firstName: 'F09', lastName: 'L09' } ]

@lad2025 mentioned that it could be because of my incorrect use of GROUP BY in the comments. However, I checked the IDs match. Meaning F00 does belong to user ID 4835 for example.

Jiew Meng
  • 84,767
  • 185
  • 495
  • 805
  • 2
    The order of subquery is not guaranteed for outer query. The only way to sort things propely is to use `ORDER BY` in the most outer query. Why this happens: for example parallel execution, access to data in non-sequential order and so on (dependent on query optimizer) – Lukasz Szozda Dec 07 '15 at 07:54
  • 2
    To add to what @lad2025 already said, just add `ORDER BY u.firstName ASC, u.lastName ASC, u.email ASC` to the outer query and you should be good to go. – Tim Biegeleisen Dec 07 '15 at 07:58
  • 1
    There is second problem with your query: http://stackoverflow.com/questions/33629168/group-by-clause-in-mysql-and-postgresql-why-the-error-in-postgresql/33629201#33629201 Your `GROUP BY` and `SELECT` won't match. You should a) match column in both clauses b) add aggregation function to `SELECT` – Lukasz Szozda Dec 07 '15 at 08:03
  • 2
    @lad2025: I don't see what you mean. Jiew Meng groups by user and only shows user data plus an aggregate on point numbers. Looks fine to me. – Thorsten Kettner Dec 07 '15 at 08:07
  • 1
    @ThorstenKettner It is not about ordering but resultset itself. `The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate`. GROUP BY only id but SELECT firstName, lastName, email without explicit agg func. This **may** lead to undeterministic behaviour. Try the same syntax with SQL Server – Lukasz Szozda Dec 07 '15 at 08:08
  • 1
    @lad2025: id should be a unique key for table user, so firstName etc. are functionally dependent from it, i.e. you can only find *one* firstName for one id obviously. The SQL standard requires that each selected field is either in the GROUP BY or aggregated or functionally dependent from the group. This is the case here, so no problem. – Thorsten Kettner Dec 07 '15 at 08:13
  • @JiewMeng I mentioned about `GROUP BY` because using `GROUP BY` as you proposed is bad practice and you should use SQL Standard syntax. This in your example does not change the case because the `ID` is probably unique. – Lukasz Szozda Dec 07 '15 at 08:37
  • @lad2025, so it should work in my case right? Hmm not sure why even with the order by on the outer query it still does not work correctly ... as in update – Jiew Meng Dec 07 '15 at 08:45
  • 1
    @JiewMeng Please prepare http://sqlfiddle.com or try `SELECT * FROM (SELECT u.id, u.firstName, u.lastName, u.email, u.ssoProfile, u.vip, SUM(p.number) AS totalPoints FROM ( SELECT u.id FROM user u ORDER BY u.firstName ASC, u.lastName ASC, u.email ASC LIMIT 100, 10 ) u2 INNER JOIN user u ON u2.id = u.id LEFT JOIN point p ON u.id = p.user AND p.expiryDate > NOW() GROUP BY u.id) AS sub ORDER BY firstName ASC, lastName ASC, email ASC` – Lukasz Szozda Dec 07 '15 at 08:48
  • @JiewMeng can You try to run it one more because it really should work – Traveller Dec 07 '15 at 08:49
  • @lad2025, http://sqlfiddle.com/#!9/79dff3/1. Hope I did it correctly. I've also removed fields I think is unnecessary. – Jiew Meng Dec 07 '15 at 09:32
  • @JiewMeng Add also sample data to reproduce your case – Lukasz Szozda Dec 07 '15 at 09:33
  • @lad2025, whats the difference between the ordering the outer query as a subquery vs not? Just wanting to understand better. – Jiew Meng Dec 07 '15 at 09:33
  • @JiewMeng Will you update your fiddle with data for `id` 4834 to 4839? And check if query http://sqlfiddle.com/#!9/79dff3/6/0 gives you desired results – Lukasz Szozda Dec 07 '15 at 09:34
  • @lad2025 I dont get what you mean? But so far am unable to replicate the problem. In the actual code, the problem is also intermittent. – Jiew Meng Dec 07 '15 at 09:42
  • @JiewMeng I can't replicate your problem. Did you try wrapping with another subquery? – Lukasz Szozda Dec 07 '15 at 09:43
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/97179/discussion-between-jiew-meng-and-lad2025). – Jiew Meng Dec 07 '15 at 09:58
  • To clarify: you group by u.id and u.firstname is functionally dependent from u.id, so your query is standard compliant. Different DBMS however are not, for it is difficult to determine what is functionally dependent and what is not. MySQL for example would let you select `p.number` which is not functionally dependent from `u.id`, and so you'd get an arbitrarily chosen value (i.e. one of the user's numbers). SQL Server on the other hand forces you to either group by `p.number`, too, or use an aggregate function, e.g. `SUM(p.number)`. Your query is doing this right. Don't let others confuse you. – Thorsten Kettner Dec 07 '15 at 10:57

2 Answers2

1

Firstly, both @lad2025 and @Tim Biegeleisen gave correct comments. The query optimizer changes the query so the result set remains the same and the time remains as low as possible. The query optimizer changes the order of the commands such as JOINs and WHERE conditions.

These links can give You some useful information on how query optimisation works (link1 and link2).

In MySQL you can use the EXPLAIN command to see how the actual query is being executed after the query optimisation changes it.

Regarding solving Your issue you just have to duplicate Your ORDER BY in the outer query because that way you are telling the RDBMS that you want the result to be ordered as well.

My suspicion is the in the query reordering (take a look at this -> heuristic optimisation) changes something because you have the same table in the subquery as in the outer one (not the most accurate explanation, I know). This is very hard to determine because it depends on the indexes you have in place, the size of memory you have at execution time, size of tables an so on....

Your query should look something like this:

SELECT u.id, u.firstName, u.lastName, u.email, u.ssoProfile, u.vip, SUM(p.number) AS totalPoints 
FROM (
      SELECT u.id FROM user u 
      ORDER BY u.firstName ASC, u.lastName ASC, u.email ASC 
      LIMIT 100, 10
) u2 
INNER JOIN `user` u 
ON u2.id = u.id 
LEFT JOIN `point` p 
ON u.id = p.user 
AND p.expiryDate > NOW() 
GROUP BY u.id
ORDER BY u.firstName ASC, u.lastName ASC, u.email ASC 

One note I would add is that You should still retain the ORDER BY in the subquery because of the LIMIT condition as that is executed after the ORDER BY so you would end up with different results compared to the original query.

Traveller
  • 399
  • 2
  • 20
0

You have been given the correct answer already. You need an ORDER BY to guarantee ordered results. An order by in a subquery doesn't necessarily affect the order of your results. So the solution was to add

ORDER BY u.firstName, u.lastName, u.email

at the end of your query.

As you still get your data in a wrong order, this must be an additional error.

Most likely is that the query gives you the data correctly ordered, but this order is changed later. So maybe you show this in an app, maybe in a listbox or the like that re-orders the records. Check your query directly with your DBMS to see if you get the records ordered correctly.

Another possibility is blanks or non printable characters you didn't notice, i.e. maybe it is ' F01' not 'F01' and this is the reason for it coming before 'F00'.

Thorsten Kettner
  • 89,309
  • 7
  • 49
  • 73