0

I have three tables: customers, orders and refunds. Some customers (not all) placed orders and for some orders (not all) there were refunds.

When I join the three tables like this (the details are not that important):

SELECT ...
FROM customers LEFT JOIN orders
ON customers.customer_id=orders.customer_id    
LEFT JOIN refunds
ON orders.order_id=refunds.order_id;
//WHERE order_id IS NOT NULL;// uncomment to filter out customers that have no orders

I get a big table in which all customers are listed (even the ones that have not placed any orders and they have NULL in the 'order_id' column), with all their orders and the orders' refunds (even if not all orders have refunds):

NAME     ORDER_ID    ORDER AMOUNT   REFUND
------------------------------------------------------------
Natalie   2         12.50           NULL
Natalie   3         18.00           18.00
Brenda    4         20.00           NULL
Adam      NULL      NULL            NULL    

Since I only want to see only customers that have placed orders, i.e in this case I want to filter Adam from the table, I uncomment the 'WHERE' row from the SQL query above. This yields the desired result. My question is: On which table is the WHERE executed - on the original 'orders' table (which has no order_id that is NULL) or on the table that is result of the JOINs? Apparently it is the latter, but just want to make sure, since it is not very obvious from the SQL syntax and it is a very important point.

Thank you

user1523271
  • 1,005
  • 2
  • 13
  • 27
  • Why not fllow through the execution plan and see where it does it? Though I do agree, it tends to happen at the end table. – Elias Sep 19 '13 at 12:55

3 Answers3

0

In this case, you're making SQL work harder than it has to. It is operating on the results (likely a MERGE event, or something along those lines).

There's a chance SQL is realizing what you're doing and optimizing the plan and changing to an INNER JOIN for you. But I can't be certain (and neither can SQL -- it can change how it optimizes over time).

In the case where you only want where an order is there, use an INNER JOIN instead. SQL will be much more efficient at this.

SELECT ...
FROM customers
INNER JOIN orders
ON customers.customer_id=orders.customer_id    
LEFT JOIN refunds
ON orders.order_id=refunds.order_id;
Eli Gassert
  • 9,745
  • 3
  • 30
  • 39
  • You are absolutely right and I actually tried this. But I also run this code (from a book, actually) for educational purposes. – user1523271 Sep 19 '13 at 12:58
  • Even so, I think I answered the question with "it depends." SQL is smart. It can often solve for things based on optimizations. You'll only know for sure if you run and analyze the plan. And even that will only give you a current pass at the optimization. If SQL changes its statistics and thinks doing it another way will yield better results, it'll re-optimize the query later. You're best off being as explicit in your intentions as possible to "guarantee" a plan. – Eli Gassert Sep 19 '13 at 13:02
  • I found the answer on another thread (http://stackoverflow.com/questions/13132447/difference-between-on-and-where-clauses-in-sql-table-joins). – user1523271 Sep 19 '13 at 13:30
0

You can change the LEFT JOIN as INNER JOIN to eliminate customers which don't have any order

SELECT ...
FROM customers INNER JOIN orders
ON customers.customer_id=orders.customer_id    
LEFT JOIN refunds
ON orders.order_id=refunds.order_id;
Ouscux
  • 187
  • 11
0

It's because you're using LEFT JOIN, which will return all rows from the left hand table, in your case this is the Customer Table, and return NULL where no corresponding values appear in the right hand tables.

Just rewrite it using inner joins, so only rows where matching data is found will be returned.

SELECT ...
FROM customers 
INNER JOIN orders
    ON customers.customer_id=orders.customer_id    
INNER JOIN refunds
    ON orders.order_id=refunds.order_id;
Tanner
  • 22,205
  • 9
  • 65
  • 83