-1

What's the best way to do this?

I'll describe below, but here is a setup at rextester.com data to play with- http://rextester.com/PXQOV60475

I have a table called "cases" that contains info on repair jobs basically. There is a field for the Case Manager called "case_mgr", and also a field for "case_mgr_first" because the Case Manager can be changed so this contains the original. There is another similar field for "who_last_called" which would be the user that last contacted the customer. Those all contain usernames... though "case_mgr_first" and "who_last_called" could be null ("case_mgr_first" is a new field and there may have been nobody that called).

To continue the repair job, the item to be repaired has to be received. Once it is received, the field "item_received_date" is set, otherwise it's null. There is also a date the record is created saved in the field "created_date".

So, the goal is to find the receiving % for users multiple ways. I'd like to find this receiving rate for the users as the current Case Manager ("case_mgr"), as "case_mgr_first", and as "who_last_called"... for a certain time period by the "created_date".

I already had a query for one of these, and it worked fine.

SELECT c.case_mgr AS case_mgr, COUNT(*) AS count_new, 
SUM(CASE WHEN c.item_received_date IS NOT NULL THEN 1 ELSE 0 END) AS count_recd, 
SUM(CASE WHEN c.item_received_date IS NOT NULL THEN 1 ELSE 0 END)*100/COUNT(*) AS percent_recd
FROM cases c WHERE (c.created_date >= '2017-05-01 00:00:00' AND c.created_date <= '2017-05-31 23:59:59') 
GROUP BY c.case_mgr ORDER BY c.case_mgr ASC

That gives me a result-

+-----------+-----------+------------+--------------+
| case_mgr  | count_new | count_recd | percent_recd |
+-----------+-----------+------------+--------------+
| bamm-bamm |        10 |          4 | 40.00        |
| barney    |       105 |         43 | 40.95        |
| betty     |       120 |         60 | 50.00        |
| fred      |       139 |         54 | 38.85        |
| wilma     |        97 |         56 | 57.73        |
+-----------+-----------+------------+--------------+

I do the same when going by "case_mgr_first".

SELECT c.case_mgr_first AS case_mgr_first, COUNT(*) AS count_new, 
SUM(CASE WHEN c.item_received_date IS NOT NULL THEN 1 ELSE 0 END) AS count_recd, 
SUM(CASE WHEN c.item_received_date IS NOT NULL THEN 1 ELSE 0 END)*100/COUNT(*) AS percent_recd 
FROM cases c WHERE (c.created_date >= '2017-05-01 00:00:00' AND c.created_date <= '2017-05-31 23:59:59') 
GROUP BY c.case_mgr_first ORDER BY c.case_mgr_first ASC

And that gives me a result-

+----------------+-----------+------------+--------------+
| case_mgr_first | count_new | count_recd | percent_recd |
+----------------+-----------+------------+--------------+
| NULL           |       137 |         62 | 45.26        |
| barney         |        84 |         44 | 52.38        |
| betty          |        72 |         37 | 51.39        |
| fred           |       116 |         47 | 40.52        |
| wilma          |        61 |         19 | 31.15        |
+----------------+-----------+------------+--------------+

(Notice that bamm-bamm appears in the first result but not in the second result, and there is a NULL entry in the second result.)

I would like a combined result that looks something like this (I removed the count_new and count_recd for easier reading)-

+-----------+-----------------------+-----------------------------+
|  user     | percent_recd_case_mgr | percent_recd_case_mgr_first |
+-----------+-----------------------+-----------------------------+
| NULL      | NULL                  | 45.26                       |
| bamm-bamm | 40.00                 | NULL                        |
| barney    | 40.95                 | 52.38                       |
| betty     | 50.00                 | 51.39                       |
| fred      | 38.85                 | 40.52                       |
| wilma     | 57.73                 | 31.15                       |
+-----------+-----------------------+-----------------------------+

I have gotten pretty close by using subqueries and joining these, and the users are combined correctly, but with the problem that with a LEFT JOIN I'm missing the result from the second query that doesn't appear in the first where the user is NULL, and with a RIGHT JOIN I'm missing the result from the first that's not in the second. Also, the query duration for it seems to just be the sum of the subqueries, which maybe can't be improved, I'm not sure.

Here the query I tried-

SELECT * FROM 
(
SELECT c.case_mgr AS case_mgr, SUM(CASE WHEN c.item_received_date IS NOT NULL THEN 1 ELSE 0 END)*100/COUNT(*) AS percent_recd
FROM cases c WHERE (c.created_date >= '2017-05-01 00:00:00' AND c.created_date <= '2017-05-31 23:59:59') 
GROUP BY c.case_mgr
) a
LEFT JOIN
(
SELECT c.case_mgr_first AS case_mgr_first, SUM(CASE WHEN c.item_received_date IS NOT NULL THEN 1 ELSE 0 END)*100/COUNT(*) AS percent_recd
FROM cases c WHERE (c.created_date >= '2017-05-01 00:00:00' AND c.created_date <= '2017-05-31 23:59:59') 
GROUP BY c.case_mgr_first 
) b
ON a.case_mgr = b.case_mgr_first 
ORDER BY a.case_mgr ASC

And here's the result-

+-----------+-----------------------+----------------+-----------------------------+
| case_mgr  | percent_recd_case_mgr | case_mgr_first | percent_recd_case_mgr_first |
+-----------+-----------------------+----------------+-----------------------------+
| bamm-bamm | 50.00                 | NULL           | NULL                        |
| barney    | 40.95                 | barney         | 52.38                       |
| betty     | 50.00                 | betty          | 51.39                       |
| fred      | 38.85                 | fred           | 40.52                       |
| wilma     | 57.73                 | wilma          | 31.15                       |
+-----------+-----------------------+----------------+-----------------------------+

I can do this with 2 queries and combine them in code, but it would be nice to have them in a query, and especially if performance can be improved somehow.

With a little more reading, I understand this is like a FULL OUTER JOIN in other SQL and doesn't exist in MySQL. It's emulated in MySQL with a UNION of a LEFT JOIN and a RIGHT JOIN. Ok, now I tried that, and it does work, but wow it takes .92 seconds (and it would be pretty bad with adding in another field like "who_last_called" I mentioned at the beginning). The original 2 queries each took about .22 seconds, and my first attempted JOIN took .50 seconds. There is an index on the "case_mgr" field, but not "case_mgr_first".

Any help or advice is appreciated! Is there a better way, or maybe should I stick with individual queries and put them together in code?

Michael K
  • 1,031
  • 2
  • 14
  • 27
  • See [Why should I provide an MCVE for what seems to me to be a very simple SQL query](http://meta.stackoverflow.com/questions/333952/why-should-i-provide-an-mcve-for-what-seems-to-me-to-be-a-very-simple-sql-query) – Strawberry Jun 15 '17 at 22:36
  • @Strawberry I did search a lot for this question and saw some people used sqlfiddle. I knew of jsfiddle but didn't realize there was an sqlfiddle. So while I was writing this question, I tried going there to make an example but the site didn't work, it was completely blank despite several reloads in different browsers. Checking just now, it looks like it works again thankfully. – Michael K Jun 16 '17 at 14:05
  • Sqlfiddle's more user-friendly, but I prefer rextester. But, either way, that's the least important part of the linked answer. – Strawberry Jun 16 '17 at 14:11
  • @Strawberry Ok, I will add your #1 at that answer to this question as soon as I finish that as that looks like all I'm missing. – Michael K Jun 16 '17 at 14:16
  • @Strawberry I finished putting this together - http://rextester.com/PXQOV60475 . Also I made one for Patrick's answer below in a comment there which looks good. – Michael K Jun 16 '17 at 15:23

1 Answers1

1

I guess you could cook up a solution, where you do a UNION sub select and group again over that sub select, but it's not gonna be pretty:

SELECT tmp.mgr,
  SUM(tmp.percent_recd_case_mgr) AS percent_recd_case_mgr,
  SUM(tmp.percent_recd_case_mgr_first) AS percent_recd_case_mgr
FROM ((
  -- this the first part will basically contain the case_mgr data
  SELECT
    c.case_mgr AS mgr,
    SUM(
      CASE WHEN c.item_received_date IS NOT NULL THEN 1 ELSE 0 END
    )*100/COUNT(*) AS percent_recd_case_mgr,
    0 AS percent_recd_case_mgr_first -- 0 as third column
  FROM cases c
  WHERE
    c.created_date >= '2017-05-01 00:00:00'
    AND c.created_date <= '2017-05-31 23:59:59'
  GROUP BY c.case_mgr ORDER BY c.case_mgr ASC
) UNION (
  -- And the second part contains the case_mgr_first data
  SELECT
    c.case_mgr_first AS mgr,
    0 AS percent_recd_case_mgr, -- 0 as second column
    SUM(
      CASE WHEN c.item_received_date IS NOT NULL THEN 1 ELSE 0 END
    )*100/COUNT(*) AS percent_recd_case_mgr_first
  FROM cases c
  WHERE
    c.created_date >= '2017-05-01 00:00:00'
    AND c.created_date <= '2017-05-31 23:59:59'
  GROUP BY c.case_mgr_first ORDER BY c.case_mgr_first ASC
)) AS tmp -- together both parts form a temp table and we sum again
          -- over all records
GROUP BY tmp.mgr
ORDER BY tmp.mgr ASC;
Patrick Günther
  • 320
  • 2
  • 10
  • This looks like it works great! I see how it works. It gets the correct output, and runs in the same time as a query with a single JOIN. And if I wanted to add in another field, it would just be another block in the UNION chain and an extra column. Maybe not pretty, but much better than the other method for emulating FULL OUTER JOIN in both time and messiness-level :) I edited your answer to add the "_first" to the end of the 3rd line. Also, here is a rextester.com example of your answer added onto the end of my original queries to show output- http://rextester.com/MDS93276 – Michael K Jun 16 '17 at 15:16