MySQL complex semi-join without group by

Question

Summary

I am looking for a semi-join(ish) query that selects a number of customers and joins their most recent data from other tables.

At a later time, I wish to directly append conditions to the end of the query: WHERE c.id IN (1,2,3)

Problem

As far as I am aware, my requirement rules out GROUP BY:

SELECT * FROM customer c
LEFT JOIN customer_address ca ON ca.customer_id = c.id
GROUP BY c.id
# PROBLEM: Cannot append conditions *after* GROUP BY!

With most subquery-based attempts, my problem is the same.

As an additional challenge, I cannot strictly use a semi-join, because I allow at least two types of phone numbers (mobile and landline), which come from the same table. As such, from the phone table I may be joining multiple records per customer, i.e. this is no longer a semi-join. My current solution below illustrates this.

Questions

The EXPLAIN result at the bottom looks performant to me. Am I correct? Are each of the subqueries executed only once? Update: It appears that DEPENDENT SUBQUERY is executed once for each row in the outer query. It would be great if we could avoid this.
Is there a better solution to what I am doing?

DDLs

DROP TABLE IF EXISTS customer;

CREATE TABLE `customer` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  PRIMARY KEY (`id`)
);

DROP TABLE IF EXISTS customer_address;

CREATE TABLE `customer_address` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `customer_id` bigint(20) unsigned NOT NULL,
  `street` varchar(85) DEFAULT NULL,
  `house_number` int(10) unsigned DEFAULT NULL,
  PRIMARY KEY (`id`)
  );

DROP TABLE IF EXISTS customer_phone; 
CREATE TABLE `customer_phone` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `customer_id` bigint(20) unsigned NOT NULL,
  `phone` varchar(32) DEFAULT NULL,
  `type` tinyint(3) unsigned NOT NULL COMMENT '1=mobile,2=landline',
  PRIMARY KEY (`id`)
  );

insert ignore customer values (1);
insert ignore customer_address values (1, 1, "OldStreet", 1),(2, 1, "NewStreet", 1);
insert ignore customer_phone values (1, 1, "12345-M", 1),(2, 1, "12345-L-Old", 2),(3, 1, "12345-L-New", 2);

SELECT * FROM customer;
+----+
| id |
+----+
|  1 |
+----+

SELECT * FROM customer_address;
+----+-------------+-----------+--------------+
| id | customer_id | street    | house_number |
+----+-------------+-----------+--------------+
|  1 |           1 | OldStreet |            1 |
|  2 |           1 | NewStreet |            1 |
+----+-------------+-----------+--------------+

SELECT * FROM customer_phone;
+----+-------------+-------------+------+
| id | customer_id | phone       | type |
+----+-------------+-------------+------+
|  1 |           1 | 12345-M     |    1 |
|  2 |           1 | 12345-L-Old |    2 |
|  3 |           1 | 12345-L-New |    2 |
+----+-------------+-------------+------+

Solution so far

SELECT *
FROM customer c

# Join the most recent address
LEFT JOIN customer_address ca ON ca.id = (SELECT MAX(ca.id) FROM customer_address ca WHERE ca.customer_id = c.id)

# Join the most recent mobile phone number
LEFT JOIN customer_phone cphm ON cphm.id = (SELECT MAX(cphm.id) FROM customer_phone cphm WHERE cphm.customer_id = c.id AND cphm.`type` = 1)

# Join the most recent landline phone number
LEFT JOIN customer_phone cphl ON cphl.id = (SELECT MAX(cphl.id) FROM customer_phone cphl WHERE cphl.customer_id = c.id AND cphl.`type` = 2)

# Yay conditions appended at the end
WHERE c.id IN (1,2,3)

Fiddle

This fiddle gives the appropriate result set using the given solution. See my questions above.

http://sqlfiddle.com/#!9/98c57/3

The `where` goes before the `group by`. What is so difficult about using the correct syntax? It is rather unclear what you are trying to do. A good way to explain a problem is by using sample data and desired results. And finally, what is your question? — Gordon Linoff, Jan 13 '16 at 12:58
The difficulty is that the query is to be put in a string constant, without conditions. The conditions are then appended by various different use cases. I do not wish to cut the query in twine to put the conditions in the correct place. So far we use a simple append for all our queries, and I do not wish to deviate for this one. — Timo, Jan 13 '16 at 13:01
Try appending HAVING instead of WHERE. Hopefully the optimiser will sort it out. Seems like you've got yourself into a bit of a hole framework wise though. — LoztInSpace, Jan 13 '16 at 13:05
Yeah, it is a limitation. Good call, but for various reasons I am really looking for a solution that can use `WHERE`. — Timo, Jan 13 '16 at 13:11
Why do you have to use WHERE? If your limitation is being able to append then you can't do it with WHERE. The WHERE has to come before GROUP. You can look and ask all you want, but you won't find a way around that :). Use HAVING and you're in with a chance. — LoztInSpace, Jan 13 '16 at 13:28
If you like, consider following this simple two-step course of action: 1. If you have not already done so, provide proper CREATE and INSERT statements (and/or an sqlfiddle) so that we can more easily replicate the problem. 2. If you have not already done so, provide a desired result set that corresponds with the information provided in step 1. — Strawberry, Jan 13 '16 at 13:30
@LoztInSpace My proposed solutions says we certainly *can* do it. Now I am just looking for possible alternatives, and for theoretical performance info. I am keeping `HAVING` in the back of my mind, but it is not great for me due to (off-topic) circumstance. — Timo, Jan 13 '16 at 13:32
I guess the other way is to wrap it all in a sub query and append a join to `(select 1 as wantedIds UNION ALL select 2 UNION ALL select 3) as X on (c.Id=X.WantedIds)` — LoztInSpace, Jan 13 '16 at 13:37
@Strawberry I have added the fiddle link at the end of my question. — Timo, Jan 13 '16 at 13:59

Paul Maxwell · Answer 1 · 2016-01-14T10:15:16.807

I would avoid those dependent subqueries, instead try this:

SELECT
      *
FROM customer c
      LEFT JOIN (
            SELECT
                  customer_id
                , MAX(id) AS currid
            FROM customer_phone
            WHERE type = 1
            GROUP BY
                  customer_id
      ) gm ON c.id = gm.customer_id
      LEFT JOIN customer_phone mobis ON gm.currid = mobis.id
      LEFT JOIN (
            SELECT
                  customer_id
                , MAX(id) AS currid
            FROM customer_phone
            WHERE type = 2
            GROUP BY
                  customer_id
      ) gl ON c.id = gl.customer_id
      LEFT JOIN customer_phone lands ON gl.currid = lands.id
WHERE c.id IN (1, 2, 3)
;

or, perhaps:

SELECT
      *
FROM customer c
      LEFT JOIN (
            SELECT
                  customer_id
                , MAX(case when type = 1 then id end) AS mobid
                , MAX(case when type = 2 then id end) AS lndid
            FROM customer_phone
            GROUP BY
                  customer_id
      ) gp ON c.id = gp.customer_id
      LEFT JOIN customer_phone mobis ON gp.mobid = mobis.id
      LEFT JOIN customer_phone lands ON gp.lndid = lands.id
WHERE c.id IN (1, 2, 3)
;

see: http://sqlfiddle.com/#!9/ef983/1/

Thank you for this suggestion. Unfortunately, the subquery is done first without any condition, so it performs a `GROUP BY` on the entire customer_phone table, rather than on a small subset of records. — Timo, Jan 20 '16 at 10:01
You have proof of that? If you can prove it, then fine, add a where clause into the subquery too. But before jumping to conclusions do find proof you need it. — Paul Maxwell, Jan 20 '16 at 10:56
Hehe, that is the problem: avoiding a `WHERE` in the middle of the query is what I am trying to avoid with this question. These tables will soon hold millions of records, like their predecessors, so a full table `GROUP BY` is out of the question. I have tried it on tables of such size to confirm that this is indeed what happens. — Timo, Jan 21 '16 at 08:47

MySQL complex semi-join without group by

1 Answers1