0

I'm trying for a while to get a query running, but it is more difficult than I thought. I have the following 3 tables:

Stores:
+--+-------+--------------+-------+---------+---------+----+---------------+---+------+
|ID|Company|Address       |PostCod|Latitude |Longitude|Stat|   Division    |Gro| When |
+--+-------+--------------+-------+---------+---------+----+---------------+---+------+
|17|Company|Site Address 3|WF1 5NT|53.666340|-1.487857|OPEN|Test Division 2|arl|2014-01-31 14:36:04
|18|Company|Site Address 3|WF1 5NT|53.666340|-1.487857|OPEN|Test Division 2|arl|2014-01-31 14:36:04
|19|Company|Site Address 3|WF1 5NT|53.666340|-1.487857|OPEN|Test Division 2|arl|2014-01-31 14:36:04
|20|Company|Site Address 3|WF1 5NT|53.666340|-1.487857|OPEN|Test Division 2|arl|2014-01-31 14:36:04
+--+-------+--------------+-------+---------+---------+----+---------------+---+------+

Jobs:
+--------+-------------------------------------------+-------+------------+------+--------+
| client | description                               | freq  | from       | till |job_id |
+--------+-------------------------------------------+-------+------------+------+--------+
|     17 | Weekly external and internal window clean |  7    | 2013-10-01 | NULL |    17 |
|     18 | Weekly external and internal window clean |  7    | 2013-10-01 | NULL |    18 |
|     19 | Weekly external and internal window clean |  7    | 2013-10-01 | NULL |    19 |
|     20 | Weekly external and internal window clean |  7    | 2013-10-01 | NULL |    20 |
|     17 | 4 weekly fascia and upper floor windows   | 28    | 2013-10-01 | NULL |   645 |
|     18 | 4 weekly fascia and upper floor windows   | 28    | 2013-10-01 | NULL |   646 |
|     19 | 4 weekly fascia and upper floor windows   | 28    | 2013-10-01 | NULL |   647 |
|     20 | 4 weekly fascia and upper floor windows   | 28    | 2013-10-01 | NULL |   648 |
+--------+-------------------------------------------+-------+------------+------+--------+


Job_hist
+------+-------------------+-----------------+---------+------+--------------+-----------------------------------------+-------------------------------+------+-----+-----------------------+------+
|Job_id|       last        |       user      | Company | LINK | Signedy_by   |       Job Description                   |          Address              | Accu |Days | Possition             |Uniqid|
+------+-------------------+-----------------+---------+------+--------------+-----------------------------------------+-------------------------------+------+-----+-----------------------+------+
|17    |2013-10-01 09:35:37|ARL_Operative_013|Santander| LINK |tony moore    |Weekly external window clean             | AN WELLING 14 BR (DA16 3PP)   |739.00|10   |Branch Manager         |132   |
|20    |2013-10-02 12:27:51|ARL_Operative_013|Santander| LINK |alex goodman  |Weekly external window clean             | AN HAROLD HILL 69 FR (RM3 8XA)|55.00 |6    |Store Assistant Manager|268   |
|19    |2013-10-03 09:14:19|ARL_Operative_013|Santander| LINK |darren pickett|Weekly external window clean             | AN WOOLWICH 41 PS (SE18 6JD)  |50.00 |5    |Other                  |332   |
|18    |2013-10-03 09:54:49|ARL_Operative_013|Santander| LINK |james lawrence|Weekly external window clean             | AN ELTHAM 73 EHS (SE9 1UW)    |49.00 |7    |Other                  |346   |
|17    |2013-10-08 09:05:16|ARL_Operative_013|Santander| LINK |tony moore    |Weekly external and internal window clean| AN WELLING 14 BR (DA16 3PP)   |67.00 |6    |Branch Manager         |697   |
+------+-------------------+-----------------+---------+------+--------------+-----------------------------------------+-------------------------------+------+-----+-----------------------+------+

And I need to write a query which does some performance calculation based on divisions. Goal is to:

  1. Calculate how many cleans where scheduled in a given time frame, (simple version) take start date and end date, calculate numbers of day and divide with stores.freq

  2. How many of the scheduled cleans where done on time in this period. We take this data from job_hist (where job_hist.jod_id=jobs.job_id and job_hist.last falls in the interval and job_hist.days <= freq

  3. Percentage of the cleans done on time

  4. How many cleans where done late (same logic like in point 2 just job_hist> freq)

  5. Percentage of the cleans done late

  6. How many where missed (scheduled - done on time - done late)

  7. Percentage of missed

All this should be grouped by division, so result should look something like this:

+------------------------------+---------+------+-----------------+---------+--------------------+------+-----------------+
|division                      |scheduled|ontime|ontime_percentage|completed|completed_percentage|missed|missed_percentage
+------------------------------+---------+------+-----------------+---------+--------------------+------+-----------------+
|fsdfoihsdfljksdlgjdfsligsgfsfd|16282    |10404 |63.90%           |10825    |66.48%              |5457  |33.52%
|Test Division 2               |259      |129   |49.81%           |133      |51.35%              |126   |48.65%
|Test Division 3               |30       |15    |50.00%           |15       |50.00%              |15    |50.00%
+------------------------------+---------+------+-----------------+---------+--------------------+------+-----------------+

Now I'm not too good with SQL queries, but I have managed to put together the following query:

SELECT `stores`.`division`, 
SUM(DATEDIFF(LEAST(IFNULL(`till`,CURDATE()),'$till'),GREATEST(`from`,'$from')) DIV `freq`) as `scheduled`, 
count(`name`)  as `ontime`, 
concat(round(( count(`name`)/SUM(DATEDIFF(LEAST(IFNULL(`till`,CURDATE()),'$till'),GREATEST(`from`,'$from')) DIV `freq`) * 100 ),2),'%') AS ontime_percentage,
 count(`user`) as `completed`,
concat(round(( count(`user`)/SUM(DATEDIFF(LEAST(IFNULL(`till`,CURDATE()),'$till'),GREATEST(`from`,'$from')) DIV `freq`) * 100 ),2),'%') AS completed_percentage,
(SUM(DATEDIFF(LEAST(IFNULL(`till`,CURDATE()),'$till'),GREATEST(`from`,'$from')) DIV `freq`)-count(`user`)) as `missed`,
concat(round(( (SUM(DATEDIFF(LEAST(IFNULL(`till`,CURDATE()),'$till'),GREATEST(`from`,'$from')) DIV `freq`)-count(`user`)) /SUM(DATEDIFF(LEAST(IFNULL(`till`,CURDATE()),'$till'),GREATEST(`from`,'$from')) DIV `freq`) * 100 ),2),'%') AS missed_percentage
from `stores` left join `jobs` on `stores`.`id`=`jobs`.`client`
left join (select `user`,`job_id` from `job_hist` where  `job_hist`.`last`>='$from' and `job_hist`.`last`<='$till') as `myquery` on `myquery`.`job_id`=`jobs`.`job_id`
left join (select `name`,`job_hist`.`job_id` from `job_hist` left join `jobs` on `jobs`.`job_id`=`job_hist`.`job_id` where `job_hist`.`last`>='$from' and `job_hist`.`last`<='$till' and `job_hist`.`days`<=`jobs`.`freq`) as `myquery2` on `myquery2`.`job_id`=`jobs`.`job_id`
where `stores`.`owner`='$group'
group by `division`

The $till, $from, $group are some PHP variables, $till and $from are dates in string formats and $group is a string.

Now the query runs nicely, BUT (there is always a but), the Scheduled field is summing up different values, what I mean sometimes it doubling the value, sometimes it's tripling it.

I have read THIS but I still wasn't able to get my head around it. I more or less understand that I'm some of the results are doubled, but still cannot figure it out what do I need to change.

Before anybody suggest to join on division between stores.division and job_hist.division that is not viable, since divisions can change in time, and they get updated in stores but not in job_hist.job_hist should not be modified at any time, that is only containing historical data.

Update

OK, I managed to tweak my query and till now it looks that I'm getting the right data, but I need to run more tests to be sure before I post my own answer. Anyway here is my updated query:

SELECT `stores`.`division`, 
SUM(DATEDIFF(LEAST(IFNULL(`till`,CURDATE()),'$till'),GREATEST(`from`,'$from')) DIV `freq`)   as `scheduled`,
sum(`ontime`), 
concat(round(( sum(`ontime`)/SUM(DATEDIFF(LEAST(IFNULL(`till`,CURDATE()),'$till'),GREATEST(`from`,'$from')) DIV `freq`) * 100 ),2),'%') AS ontime_percentage, 
sum(`completed`-`ontime`), 
concat(round(( sum(`completed`-`ontime`)/SUM(DATEDIFF(LEAST(IFNULL(`till`,CURDATE()),'$till'),GREATEST(`from`,'$from')) DIV `freq`) * 100 ),2),'%') AS completed_percentage, 
(SUM(DATEDIFF(LEAST(IFNULL(`till`,CURDATE()),'$till'),GREATEST(`from`,'$from')) DIV `freq`)-sum(`completed`)) as `missed`, 
concat(round(( (SUM(DATEDIFF(LEAST(IFNULL(`till`,CURDATE()),'$till'),GREATEST(`from`,'$from')) DIV `freq`)-sum(`completed`)) /SUM(DATEDIFF(LEAST(IFNULL(`till`,CURDATE()),'$till'),GREATEST(`from`,'$from')) DIV `freq`) * 100 ),2),'%') AS missed_percentage from `stores` 
left join `jobs` on `stores`.`id`=`jobs`.`client` 
left join (select count(`user`) as `completed`,`job_id` from `job_hist` where `job_hist`.`last`>='$from' and `job_hist`.`last`<='$till' group by `job_id`) as `myquery` on `myquery`.`job_id`=`jobs`.`job_id` 
left join (select count(`name`) as `ontime`,`job_hist`.`job_id` from `job_hist` left join `jobs` on `jobs`.`job_id`=`job_hist`.`job_id` where `job_hist`.`last`>='$from' and `job_hist`.`last`<='$till' and `job_hist`.`days`<=`jobs`.`freq` group by `job_id`) as `myquery2` on `myquery2`.`job_id`=`jobs`.`job_id` 
where `stores`.`owner`='$group' 
group by `stores`.`division`

I'd be open to a better way of running this query.

halfer
  • 19,824
  • 17
  • 99
  • 186
Emil Borconi
  • 3,326
  • 2
  • 24
  • 40
  • Why do you have the same store listed multiple times? – Strawberry Mar 10 '14 at 15:16
  • In the stores table you mean? It's just test data which I dumped in the table. But the id's are unique so they are different stores... – Emil Borconi Mar 10 '14 at 15:20
  • In reality this will not look like this, I was to "lazy" to write so much so I inserted one store and just copied over and over several times... In real life this will not happen, there is a part of the script which validates that there cannot be 2 stores from same company with same latitude and longitude... but that is a totally different chapter... – Emil Borconi Mar 10 '14 at 15:25
  • But the same job description will appear several times in the job table - in connection with separate properties? – Strawberry Mar 10 '14 at 15:38
  • Yes, that is happening even in reality. You can have exactly the same job description for multiple stores. In some cases with the exact same freq, in other cases different freq, and so. Job Description is a free text field. Each store can have multiple jobs associated, with a free descriptions, and with any integer >0 in the freq field. – Emil Borconi Mar 10 '14 at 15:42
  • So could you have a generic `jobs` table? And another table that stores which job relates to which store - and how often? – Strawberry Mar 10 '14 at 15:43
  • Yes I could..., but to be honest I will prefer not to change my table structure now. I written quite a lot of PHP / HTML / Javascript code which is based on this table structure.... – Emil Borconi Mar 10 '14 at 15:46
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/49412/discussion-between-emil-borconi-and-strawberry) – Emil Borconi Mar 10 '14 at 15:47

1 Answers1

0

The problem was on the grouping so here is the query which returns the data correctly:

SELECT `stores`.`division`, 
SUM(DATEDIFF(LEAST(IFNULL(`till`,CURDATE()),'$till'),GREATEST(`from`,'$from')) DIV `freq`)   as `scheduled`,
sum(`ontime`), 
concat(round(( sum(`ontime`)/SUM(DATEDIFF(LEAST(IFNULL(`till`,CURDATE()),'$till'),GREATEST(`from`,'$from')) DIV `freq`) * 100 ),2),'%') AS ontime_percentage, 
sum(`completed`)-sum(`ontime`), 
concat(round(( (sum(`completed`)-sum(`ontime`))/SUM(DATEDIFF(LEAST(IFNULL(`till`,CURDATE()),'$till'),GREATEST(`from`,'$from')) DIV `freq`) * 100 ),2),'%') AS completed_percentage, 
(SUM(DATEDIFF(LEAST(IFNULL(`till`,CURDATE()),'$till'),GREATEST(`from`,'$from')) DIV `freq`)-sum(`completed`)) as `missed`, 
concat(round(( (SUM(DATEDIFF(LEAST(IFNULL(`till`,CURDATE()),'$till'),GREATEST(`from`,'$from')) DIV `freq`)-sum(`completed`)) /SUM(DATEDIFF(LEAST(IFNULL(`till`,CURDATE()),'$till'),GREATEST(`from`,'$from')) DIV `freq`) * 100 ),2),'%') AS missed_percentage from `stores` 
left join `jobs` on `stores`.`id`=`jobs`.`client` 
left join (select count(`user`) as `completed`,`job_id` from `job_hist` where `job_hist`.`last`>='$from' and `job_hist`.`last`<='$till' group by `job_id`) as `myquery` on `myquery`.`job_id`=`jobs`.`job_id` 
left join (select count(`name`) as `ontime`,`job_hist`.`job_id` from `job_hist` left join `jobs` on `jobs`.`job_id`=`job_hist`.`job_id` where `job_hist`.`last`>='$from' and `job_hist`.`last`<='$till' and `job_hist`.`days`<=`jobs`.`freq` group by `job_id`) as `myquery2` on `myquery2`.`job_id`=`jobs`.`job_id` 
where `stores`.`owner`='$group' 
group by `stores`.`division`

If anybody has a better solution please do post it.

halfer
  • 19,824
  • 17
  • 99
  • 186
Emil Borconi
  • 3,326
  • 2
  • 24
  • 40