4

Possible Duplicate:
Explicit vs implicit SQL joins
SQL JOIN: is there a difference between USING, ON or WHERE?

I'm going over code maintained by a developer who was not very familiar with SQL. I see snippets such as the following quite frequently in his code:

SELECT *
FROM person, status
WHERE person.status_id = status.id

I've suggested to him that he use the following instead:

SELECT *
FROM person
INNER JOIN status ON status.id = person.status_id

He pointed to the fact that, in this particular case, both queries returned identical results in an identical time frame (34k rows in 67 ms). The fact that my new query didn't change anything in this case is evidence to him that there is nothing wrong with this method. I've tried explaining cartesian products and such to him, but he insists that there is nothing wrong with this method. Can someone help provide negative examples of where relying on this would fail, and/or why this line of querying is dangerous from an implementation perspective?

Community
  • 1
  • 1
rybosome
  • 5,046
  • 7
  • 44
  • 64
  • 2
    The only issue I see here is one of explicit vs implicit joins. Any optimizer will see both of these as the same query. – Oded Dec 20 '11 at 20:52
  • Run both of them through some kind of query analysis tool to see if you can get the execution paths to compare. If they're different, that might help you come up with examples to exploit that difference. Honestly, to me, the only downside I've ever encountered with the former is that I find it less intuitive to read and understand. It takes an extra cognitive step or two to picture the cartesian structure in my head, which adds an extra step to maintaining code where something can go wrong. – David Dec 20 '11 at 20:55
  • ANSI joins make for clearer code; show him some `LEFT JOIN`s vs. old `(+)` syntax, or some queries that join half a dozen tables with table-adjacent `ON` clauses vs. old far-from-the-table `WHERE` clauses, and he'll see that for himself. But there's nothing objectively wrong with old-style joins. – ruakh Dec 20 '11 at 20:56
  • The first query would only work in MySQL, while the second one would work in most popular DBMS's. – WWW Dec 20 '11 at 20:57
  • 1
    @Crontab That's not true, actually. For example, the former style works just fine on MS SQL Server. What "popular DBMS" doesn't support old-style joins? – Dan J Dec 20 '11 at 21:01
  • The main risk of the implicit (ANSI 89) style IMO is forgetting to include the `JOIN` condition and getting an accidental cartesian join. This is impossible with the `JOIN` keyword as the `ON` is required by the syntax. – Martin Smith Dec 20 '11 at 21:02
  • 1
    It is good to think practically, (the way you do,) because in more complex situations even the query optimizer of the RDBMS can get confused and produce sub-optimal results. It will be very difficult for anyone here to provide examples. But if that guy has been writing lots of queries with that paleolithic kind of approach, then look at the most complex queries he has written in your entire system and it is very likely that you will find some that can be greatly improved by rewriting them in a more rational way. – Mike Nakis Dec 20 '11 at 21:03
  • @djacobson: I honestly never realized that. I used MSSQL in a professional environment for 5 years and I 1) never wrote my queries that way, and 2) never saw anyone else who did either. Learn something new every day. – WWW Dec 20 '11 at 21:08
  • 1
    @Crontab No worries, but you've given me the opportunity to deploy one of my favorite chestnuts: *the absence of evidence is not evidence of absence.* ;) – Dan J Dec 20 '11 at 21:42
  • 1
    SQL server does not work for implied left joins though not even as far back as SQL server 2000 where Books Online will tell you that it is inconsisistently evaluated. – HLGEM Dec 20 '11 at 21:45
  • 1
    Ask him if he would use syntax that was replaced 20 years ago in any other language? – HLGEM Dec 20 '11 at 21:46

3 Answers3

11

It's true that both forms of syntax should give the same result and internally MySQL executes them both in exactly the same way. Current versions of the SQL standard support both forms, although the comma-style is supported only for the sake of backward compatibility.

There is a case where using the comma-style syntax fails, but it's exotic:

SELECT * FROM A, B JOIN C ON C.x = A.y;

The JOIN operator has higher precedence than the comma. So as the above query is trying to evaluate C.x = A.y it doesn't even know that A is part of the query. So you get an error:

ERROR 1054 (42S22): Unknown column 'A.y' in 'on clause'

The best remedy is to use the JOIN syntax consistently instead of mixing them.

Also you can't make outer joins with the comma-join syntax. Oracle and Sybase/Microsoft each invented their own proprietary syntax for handling outer joins, but neither are supported by other brands of RDBMS. Today, all current versions of RDBMS including Oracle and Sybase/Microsoft support the standard JOIN syntax, so there's no good reason to use the legacy vendor-specific extensions.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
0

This is just a syntax variant. Both are join specifications. The first is simply implicit. The second is simply explicit. Early SQL standards didn't have the JOIN keyword.

Can someone help provide negative examples of where relying on this would fail,

No.

and/or why this line of querying is dangerous from an implementation perspective?

It isn't. It's just an older (but still perfectly valid) syntax.

S.Lott
  • 384,516
  • 81
  • 508
  • 779
  • 4
    As in other code, I still rather see the _explicit_ version over the implicit. – Oded Dec 20 '11 at 20:53
  • 1
    It is more risky because more risk of foobars and semantic differences, especially with outer joins. Example (read comments): http://stackoverflow.com/a/5654338/27535 – gbn Dec 20 '11 at 20:53
  • 1
    Outer joins have a radically different "legacy" syntax. There's still no ambiguity. Explicit **is** more clear. But there's no technical difference. – S.Lott Dec 20 '11 at 20:57
  • 3
    Actually -1 because it *is* dangerous. If you use explicit outer join with implicit inner join, then *how do you know processing order*? – gbn Dec 20 '11 at 20:57
  • @gbn: It is? Really? Can you provide an example or evidence or something? – S.Lott Dec 20 '11 at 20:58
  • Logical query processing order is FROM before WHERE. Ergo, semantically bollocks query. Simple. No example needed. – gbn Dec 20 '11 at 21:00
  • @gbn: "FROM before WHERE."? What? That makes approximately no sense. The order of the clauses is merely a syntax convenience. There's no essential ordering between FROM and WHERE. The statement -- as a whole -- is used to generate a query plan. Do you have an example of a RDBMS which somehow magically fails to work correctly with an older-style SELECT syntax? – S.Lott Dec 20 '11 at 21:02
  • Just read the link I posted and stop trying to be smart. Oh, and "from before where"? Read MSDN http://stackoverflow.com/a/7595153/27535 (MySQL has the same logical processing order)- – gbn Dec 20 '11 at 21:06
  • The SO answer seemed clear to me. "Note that the actual physical execution of the statement is determined by the query processor and the order may vary from this list." The FROM before WHERE issue doesn't actually exist. Can you provide an actual example? – S.Lott Dec 20 '11 at 21:08
  • 3
    @S.Lott - The position of the `ON` clause in `ANSI-92` style joins controls the logical order of evaluation of the joins. This logical order can make a difference for outer joins. – Martin Smith Dec 20 '11 at 21:13
  • 3
    -1 : The older format leads to more complexity and more resource hogging when you make a minor error with where you put the (+) for an OUTER join. If you value clarity of code and software engineering practices, you can only possibly favour ANSI-92. In my opinion. – MatBailie Dec 20 '11 at 21:18
  • 1
    Another -1: The potential for problems with the older implicit syntax is too great not to discourage its use. – JNK Dec 20 '11 at 21:27
  • 2
    [An example where mixing explicit outer joins and implicit inner joins makes a difference to the results](http://data.stackexchange.com/stackoverflow/q/122274/) – Martin Smith Dec 20 '11 at 21:32
  • 1
    @gbn: The order of execution is implementation-specific. In the case of MySQL (which the OP tagged his question), conditions in an `ON` clause are reorderable with conditions in the `WHERE` clause. – Bill Karwin Dec 20 '11 at 21:57
  • @MartinSmith: Both of your queries provide *explicit* JOIN operations with explicit ON. An interesting point. But utterly irrelevant to the question where implicit (non-JOIN) syntax is compared with explicit syntax. – S.Lott Dec 20 '11 at 22:27
  • @Dems: Minor outer-join syntax errors aren't the issue here. Minor errors are **always** a problem. The issue here is just implicit (non-JOIN) syntax is compared with explicit syntax. Nothing more. – S.Lott Dec 20 '11 at 22:27
  • @JNK: That's not a technical reason. I happen to agree that explicit JOIN is easier to read. But the question looks for technical reasons. There aren't any. – S.Lott Dec 20 '11 at 22:28
  • 1
    @S.Lott - But gbn's original complaint was about mixing "explicit outer join with implicit inner join," and you asked for an example. – Martin Smith Dec 20 '11 at 22:32
  • @MartinSmith: No. The question is about explicit vs. implicit join. Nothing more complex than that. My request was for an example where the "Logical query processing order is FROM before WHERE. Ergo, semantically bollocks query". The statement appears utterly false, since the actual ("physical") query plan has nothing to do with "FROM before WHERE". That's the what I want clarification on, since it seems utterly false. – S.Lott Dec 20 '11 at 22:42
  • @BillKarwin: that leads to ambiguity then, surely? Which is the point I'm trying to make... – gbn Dec 21 '11 at 05:27
  • @gbn: No, I don't think there's any logical ambiguity. At least when using *inner* joins, `ON x=y WHERE a=b` yields the same results as `WHERE x=y AND a=b`. – Bill Karwin Dec 21 '11 at 06:01
0

I think it is about readability. Consider this:

SELECT *
FROM person
INNER JOIN status ON status.id = person.status_id where person.sex=1 or status.state='PA'

As opposed to:

SELECT *
FROM person, status
WHERE person.status_id = status.id and person.sex=1 or status.state ='PA'
Michael Fredrickson
  • 36,839
  • 5
  • 92
  • 109
PedroC
  • 111
  • 3