What's the best way to do this?
I'll describe below, but here is a setup at rextester.com data to play with- http://rextester.com/PXQOV60475
I have a table called "cases" that contains info on repair jobs basically. There is a field for the Case Manager called "case_mgr", and also a field for "case_mgr_first" because the Case Manager can be changed so this contains the original. There is another similar field for "who_last_called" which would be the user that last contacted the customer. Those all contain usernames... though "case_mgr_first" and "who_last_called" could be null ("case_mgr_first" is a new field and there may have been nobody that called).
To continue the repair job, the item to be repaired has to be received. Once it is received, the field "item_received_date" is set, otherwise it's null. There is also a date the record is created saved in the field "created_date".
So, the goal is to find the receiving % for users multiple ways. I'd like to find this receiving rate for the users as the current Case Manager ("case_mgr"), as "case_mgr_first", and as "who_last_called"... for a certain time period by the "created_date".
I already had a query for one of these, and it worked fine.
SELECT c.case_mgr AS case_mgr, COUNT(*) AS count_new,
SUM(CASE WHEN c.item_received_date IS NOT NULL THEN 1 ELSE 0 END) AS count_recd,
SUM(CASE WHEN c.item_received_date IS NOT NULL THEN 1 ELSE 0 END)*100/COUNT(*) AS percent_recd
FROM cases c WHERE (c.created_date >= '2017-05-01 00:00:00' AND c.created_date <= '2017-05-31 23:59:59')
GROUP BY c.case_mgr ORDER BY c.case_mgr ASC
That gives me a result-
+-----------+-----------+------------+--------------+
| case_mgr | count_new | count_recd | percent_recd |
+-----------+-----------+------------+--------------+
| bamm-bamm | 10 | 4 | 40.00 |
| barney | 105 | 43 | 40.95 |
| betty | 120 | 60 | 50.00 |
| fred | 139 | 54 | 38.85 |
| wilma | 97 | 56 | 57.73 |
+-----------+-----------+------------+--------------+
I do the same when going by "case_mgr_first".
SELECT c.case_mgr_first AS case_mgr_first, COUNT(*) AS count_new,
SUM(CASE WHEN c.item_received_date IS NOT NULL THEN 1 ELSE 0 END) AS count_recd,
SUM(CASE WHEN c.item_received_date IS NOT NULL THEN 1 ELSE 0 END)*100/COUNT(*) AS percent_recd
FROM cases c WHERE (c.created_date >= '2017-05-01 00:00:00' AND c.created_date <= '2017-05-31 23:59:59')
GROUP BY c.case_mgr_first ORDER BY c.case_mgr_first ASC
And that gives me a result-
+----------------+-----------+------------+--------------+
| case_mgr_first | count_new | count_recd | percent_recd |
+----------------+-----------+------------+--------------+
| NULL | 137 | 62 | 45.26 |
| barney | 84 | 44 | 52.38 |
| betty | 72 | 37 | 51.39 |
| fred | 116 | 47 | 40.52 |
| wilma | 61 | 19 | 31.15 |
+----------------+-----------+------------+--------------+
(Notice that bamm-bamm appears in the first result but not in the second result, and there is a NULL entry in the second result.)
I would like a combined result that looks something like this (I removed the count_new and count_recd for easier reading)-
+-----------+-----------------------+-----------------------------+
| user | percent_recd_case_mgr | percent_recd_case_mgr_first |
+-----------+-----------------------+-----------------------------+
| NULL | NULL | 45.26 |
| bamm-bamm | 40.00 | NULL |
| barney | 40.95 | 52.38 |
| betty | 50.00 | 51.39 |
| fred | 38.85 | 40.52 |
| wilma | 57.73 | 31.15 |
+-----------+-----------------------+-----------------------------+
I have gotten pretty close by using subqueries and joining these, and the users are combined correctly, but with the problem that with a LEFT JOIN I'm missing the result from the second query that doesn't appear in the first where the user is NULL, and with a RIGHT JOIN I'm missing the result from the first that's not in the second. Also, the query duration for it seems to just be the sum of the subqueries, which maybe can't be improved, I'm not sure.
Here the query I tried-
SELECT * FROM
(
SELECT c.case_mgr AS case_mgr, SUM(CASE WHEN c.item_received_date IS NOT NULL THEN 1 ELSE 0 END)*100/COUNT(*) AS percent_recd
FROM cases c WHERE (c.created_date >= '2017-05-01 00:00:00' AND c.created_date <= '2017-05-31 23:59:59')
GROUP BY c.case_mgr
) a
LEFT JOIN
(
SELECT c.case_mgr_first AS case_mgr_first, SUM(CASE WHEN c.item_received_date IS NOT NULL THEN 1 ELSE 0 END)*100/COUNT(*) AS percent_recd
FROM cases c WHERE (c.created_date >= '2017-05-01 00:00:00' AND c.created_date <= '2017-05-31 23:59:59')
GROUP BY c.case_mgr_first
) b
ON a.case_mgr = b.case_mgr_first
ORDER BY a.case_mgr ASC
And here's the result-
+-----------+-----------------------+----------------+-----------------------------+
| case_mgr | percent_recd_case_mgr | case_mgr_first | percent_recd_case_mgr_first |
+-----------+-----------------------+----------------+-----------------------------+
| bamm-bamm | 50.00 | NULL | NULL |
| barney | 40.95 | barney | 52.38 |
| betty | 50.00 | betty | 51.39 |
| fred | 38.85 | fred | 40.52 |
| wilma | 57.73 | wilma | 31.15 |
+-----------+-----------------------+----------------+-----------------------------+
I can do this with 2 queries and combine them in code, but it would be nice to have them in a query, and especially if performance can be improved somehow.
With a little more reading, I understand this is like a FULL OUTER JOIN in other SQL and doesn't exist in MySQL. It's emulated in MySQL with a UNION of a LEFT JOIN and a RIGHT JOIN. Ok, now I tried that, and it does work, but wow it takes .92 seconds (and it would be pretty bad with adding in another field like "who_last_called" I mentioned at the beginning). The original 2 queries each took about .22 seconds, and my first attempted JOIN took .50 seconds. There is an index on the "case_mgr" field, but not "case_mgr_first".
Any help or advice is appreciated! Is there a better way, or maybe should I stick with individual queries and put them together in code?