2

I am trying to understand the SQL self-join - especially how the order of the ON clause matters in the query. This is probably a basic question but please bear with me as I'm a beginner in query language.

This is actually a LeetCode Question - #181 where I'm trying to get the employee whose salary is higher than their manager. You can check out the schema through the LeetCode link or the SQL Fiddle example I've provided below.

Question:

Basically I'm trying to understand the difference in output when I run the below two queries:

I changed the order of the ON clause From (ON e.ManagerId = m.Id) to (ON m.ManagerId = e.Id) and I'm getting the inverse answer from the desired output. I thought because it's a self-join, the order wouldn't matter since I'm extracting information from the identical table.

Please let me know what I'm missing and also point to any directions if possible! Thanks in advance!

1) Correct Query to get Desired Output

Select *
FROM Employee e
INNER JOIN Employee m
ON e.ManagerId = m.Id
WHERE e.Salary > m.Salary

SQL Fiddle Example

2) Incorrect Query

Select *
FROM Employee e
INNER JOIN Employee m
ON m.ManagerId = e.Id
WHERE e.Salary > m.Salary

SQL Fiddle Example

  • But I'm getting a totally different output if I run the second query with the ON conditions reversed. – PotentialLime Jan 19 '20 at 23:57
  • 1
    . . I didn't look close enough. You are doing two different `JOIN`s. The queries are not related at all, so you should not expect the results to be the same. – Gordon Linoff Jan 19 '20 at 23:59
  • Can you explain more about how the queries aren't related at all? Are they not related because of the presence of null values in the table to be self-joined? Apologies if I'm not making much sense - My thinking was that since `e` and `m` are the aliases of the same table, the resulting output would be the same – PotentialLime Jan 20 '20 at 00:06
  • 1
    . . I think Thorsten has done a good job of that. – Gordon Linoff Jan 20 '20 at 01:40
  • Please put everything necessary to ask your question in the post itself, not in a link. Please in code questions give a [mre]--cut & paste & runnable code; example input (as initialization code) with desired & actual output (including verbatim error messages); tags & versions; clear specification & explanation. For errors that includes the least code you can give that is code that you show is OK extended by code that you show is not OK. (Debugging fundamental.) For SQL that includes DBMS & DDL, which includes constraints & indexes & tabular initialization. – philipxy Jan 20 '20 at 02:42

4 Answers4

3

In both queries you are joining one employee with another. In the first query, however, you call the subordinate e and the manager m, while in the second you call the manager e and the subordinate m. Let's look at this more closely:

Select *
FROM Employee e
INNER JOIN Employee m
ON e.ManagerId = m.Id
WHERE e.Salary > m.Salary

You are combining an employee (that you call e for short) with their manager (an employee called m here, the ID of which is referenced as the manager ID in the employee record). Then you only keep employee / manager pairs where the employee's salary is greater than the manager's.

Select *
FROM Employee e
INNER JOIN Employee m
ON m.ManagerId = e.Id
WHERE e.Salary > m.Salary

You are combining an employee (that you call e for short) with their subordinate (an employee called m here, the manager ID of which is referencing the employee record). So, the employee that you call e is the other employee's manager. Then you only keep employee (manager) / subordinate pairs where the manager's salary is greater than the subordinate's.

Thorsten Kettner
  • 89,309
  • 7
  • 49
  • 73
3

The functionally the order doesn't matter (so, 'ON e.ManagerId = m.Id' is the same as 'ON m.Id = e.ManagerId').

What you are doing here is joining on different columns, which represent different things.

In the incorrect query, you are saying "the managers managerID is the same as the employees ID", which isn't true. Managers (as you've got it in your table) don't have managers themselves.

What you've essentially done is inverse the join. If you were to swap your sign around in you where statement, so WHERE e.Salary > m.Salary to WHERE e.Salary < m.Salary you'd get the same answer as your correct query

LordBaconPants
  • 1,404
  • 1
  • 19
  • 22
  • Thanks for the explanation. The semantic explanation ("the managers managerID is the same as the employees ID") made the most sense for me! – PotentialLime Jan 20 '20 at 02:17
2

I think you are not realizing that the table alaises refer to the people. The m copy of the table is the manager, so the variable m.managerId would refer to the manager of the manager. Which is not what you want. So the correct link:

e.ManagerId = m.Id

is linking an employee row's manager to the manager row's ID.

Jon Wilson
  • 726
  • 1
  • 8
  • 23
1

You might want to think of it as only the ids in the Manager_id columns are Managers.

So to get their names you could do:

select name from Employee where id in (select distinct ManagerId from Employee)

  • distinct is optional. I would tend to do it, if I was debegging the nested select, as it would make sense to only see same ManagerId in there once. Two+ Employees can potentially have the same Manager.

Try running the queries without the where clause, you will see the same results but column order has switched. This is because of the ON clause:

  • ON e.ManagerId = m.Id

    (Employee e to Manager m)

    Or longhand join Employee ManagerId, to Manager Id

    Joe as the Employee, with Sam as the Manager (ascending hierarchy as you read across columns)

  • ON m.ManagerId = e.Id

    (Manager m to Employee e)

    Or longhand join Manager ManagerId, to Employee Id

    Sam as the Manager, with Joe as the Employee (descending hierarchy as you read across columns)

Column order not withstanding, if you were to flip the WHERE clause from > to <= when you flipped the ON prefix order you would yield the same results.

JGFMK
  • 8,425
  • 4
  • 58
  • 92
  • I see. thanks for the explanation! Regarding flipping the WHERE clause, I do understand that I'll get the same results once I flip the inequality, but I guess it wouldn't make semantic sense right? (Since I'm trying to get the instance where employee Salary is greater than that of the manager) - if I flip the inequality, it will look like I'm getting the instance where manager salary is greater than that of the employee. – PotentialLime Jan 20 '20 at 02:27
  • Exactly.. Semantically the first way makes far more sense for your scenario. And 99% of the time it's the way SQL is written. But, you could also see it from the perspective of the Employee needing to know their Manager (*ascending*) or the Manager needing to know their underling Employees (*descending*), in which case the latter would make sense. – JGFMK Jan 20 '20 at 10:17