I'm trying for a while to get a query running, but it is more difficult than I thought. I have the following 3 tables:
Stores:
+--+-------+--------------+-------+---------+---------+----+---------------+---+------+
|ID|Company|Address |PostCod|Latitude |Longitude|Stat| Division |Gro| When |
+--+-------+--------------+-------+---------+---------+----+---------------+---+------+
|17|Company|Site Address 3|WF1 5NT|53.666340|-1.487857|OPEN|Test Division 2|arl|2014-01-31 14:36:04
|18|Company|Site Address 3|WF1 5NT|53.666340|-1.487857|OPEN|Test Division 2|arl|2014-01-31 14:36:04
|19|Company|Site Address 3|WF1 5NT|53.666340|-1.487857|OPEN|Test Division 2|arl|2014-01-31 14:36:04
|20|Company|Site Address 3|WF1 5NT|53.666340|-1.487857|OPEN|Test Division 2|arl|2014-01-31 14:36:04
+--+-------+--------------+-------+---------+---------+----+---------------+---+------+
Jobs:
+--------+-------------------------------------------+-------+------------+------+--------+
| client | description | freq | from | till |job_id |
+--------+-------------------------------------------+-------+------------+------+--------+
| 17 | Weekly external and internal window clean | 7 | 2013-10-01 | NULL | 17 |
| 18 | Weekly external and internal window clean | 7 | 2013-10-01 | NULL | 18 |
| 19 | Weekly external and internal window clean | 7 | 2013-10-01 | NULL | 19 |
| 20 | Weekly external and internal window clean | 7 | 2013-10-01 | NULL | 20 |
| 17 | 4 weekly fascia and upper floor windows | 28 | 2013-10-01 | NULL | 645 |
| 18 | 4 weekly fascia and upper floor windows | 28 | 2013-10-01 | NULL | 646 |
| 19 | 4 weekly fascia and upper floor windows | 28 | 2013-10-01 | NULL | 647 |
| 20 | 4 weekly fascia and upper floor windows | 28 | 2013-10-01 | NULL | 648 |
+--------+-------------------------------------------+-------+------------+------+--------+
Job_hist
+------+-------------------+-----------------+---------+------+--------------+-----------------------------------------+-------------------------------+------+-----+-----------------------+------+
|Job_id| last | user | Company | LINK | Signedy_by | Job Description | Address | Accu |Days | Possition |Uniqid|
+------+-------------------+-----------------+---------+------+--------------+-----------------------------------------+-------------------------------+------+-----+-----------------------+------+
|17 |2013-10-01 09:35:37|ARL_Operative_013|Santander| LINK |tony moore |Weekly external window clean | AN WELLING 14 BR (DA16 3PP) |739.00|10 |Branch Manager |132 |
|20 |2013-10-02 12:27:51|ARL_Operative_013|Santander| LINK |alex goodman |Weekly external window clean | AN HAROLD HILL 69 FR (RM3 8XA)|55.00 |6 |Store Assistant Manager|268 |
|19 |2013-10-03 09:14:19|ARL_Operative_013|Santander| LINK |darren pickett|Weekly external window clean | AN WOOLWICH 41 PS (SE18 6JD) |50.00 |5 |Other |332 |
|18 |2013-10-03 09:54:49|ARL_Operative_013|Santander| LINK |james lawrence|Weekly external window clean | AN ELTHAM 73 EHS (SE9 1UW) |49.00 |7 |Other |346 |
|17 |2013-10-08 09:05:16|ARL_Operative_013|Santander| LINK |tony moore |Weekly external and internal window clean| AN WELLING 14 BR (DA16 3PP) |67.00 |6 |Branch Manager |697 |
+------+-------------------+-----------------+---------+------+--------------+-----------------------------------------+-------------------------------+------+-----+-----------------------+------+
And I need to write a query which does some performance calculation based on divisions. Goal is to:
Calculate how many cleans where scheduled in a given time frame, (simple version) take start date and end date, calculate numbers of day and divide with
stores
.freq
How many of the scheduled cleans where done on time in this period. We take this data from job_hist (where job_hist.jod_id=jobs.job_id and job_hist.last falls in the interval and job_hist.days <= freq
Percentage of the cleans done on time
How many cleans where done late (same logic like in point 2 just job_hist> freq)
Percentage of the cleans done late
How many where missed (scheduled - done on time - done late)
Percentage of missed
All this should be grouped by division, so result should look something like this:
+------------------------------+---------+------+-----------------+---------+--------------------+------+-----------------+
|division |scheduled|ontime|ontime_percentage|completed|completed_percentage|missed|missed_percentage
+------------------------------+---------+------+-----------------+---------+--------------------+------+-----------------+
|fsdfoihsdfljksdlgjdfsligsgfsfd|16282 |10404 |63.90% |10825 |66.48% |5457 |33.52%
|Test Division 2 |259 |129 |49.81% |133 |51.35% |126 |48.65%
|Test Division 3 |30 |15 |50.00% |15 |50.00% |15 |50.00%
+------------------------------+---------+------+-----------------+---------+--------------------+------+-----------------+
Now I'm not too good with SQL queries, but I have managed to put together the following query:
SELECT `stores`.`division`,
SUM(DATEDIFF(LEAST(IFNULL(`till`,CURDATE()),'$till'),GREATEST(`from`,'$from')) DIV `freq`) as `scheduled`,
count(`name`) as `ontime`,
concat(round(( count(`name`)/SUM(DATEDIFF(LEAST(IFNULL(`till`,CURDATE()),'$till'),GREATEST(`from`,'$from')) DIV `freq`) * 100 ),2),'%') AS ontime_percentage,
count(`user`) as `completed`,
concat(round(( count(`user`)/SUM(DATEDIFF(LEAST(IFNULL(`till`,CURDATE()),'$till'),GREATEST(`from`,'$from')) DIV `freq`) * 100 ),2),'%') AS completed_percentage,
(SUM(DATEDIFF(LEAST(IFNULL(`till`,CURDATE()),'$till'),GREATEST(`from`,'$from')) DIV `freq`)-count(`user`)) as `missed`,
concat(round(( (SUM(DATEDIFF(LEAST(IFNULL(`till`,CURDATE()),'$till'),GREATEST(`from`,'$from')) DIV `freq`)-count(`user`)) /SUM(DATEDIFF(LEAST(IFNULL(`till`,CURDATE()),'$till'),GREATEST(`from`,'$from')) DIV `freq`) * 100 ),2),'%') AS missed_percentage
from `stores` left join `jobs` on `stores`.`id`=`jobs`.`client`
left join (select `user`,`job_id` from `job_hist` where `job_hist`.`last`>='$from' and `job_hist`.`last`<='$till') as `myquery` on `myquery`.`job_id`=`jobs`.`job_id`
left join (select `name`,`job_hist`.`job_id` from `job_hist` left join `jobs` on `jobs`.`job_id`=`job_hist`.`job_id` where `job_hist`.`last`>='$from' and `job_hist`.`last`<='$till' and `job_hist`.`days`<=`jobs`.`freq`) as `myquery2` on `myquery2`.`job_id`=`jobs`.`job_id`
where `stores`.`owner`='$group'
group by `division`
The $till, $from, $group are some PHP variables, $till and $from are dates in string formats and $group is a string.
Now the query runs nicely, BUT (there is always a but), the Scheduled field is summing up different values, what I mean sometimes it doubling the value, sometimes it's tripling it.
I have read THIS but I still wasn't able to get my head around it. I more or less understand that I'm some of the results are doubled, but still cannot figure it out what do I need to change.
Before anybody suggest to join on division between stores
.division
and job_hist
.division
that is not viable, since divisions can change in time, and they get updated in stores
but not in job_hist
.job_hist
should not be modified at any time, that is only containing historical data.
Update
OK, I managed to tweak my query and till now it looks that I'm getting the right data, but I need to run more tests to be sure before I post my own answer. Anyway here is my updated query:
SELECT `stores`.`division`,
SUM(DATEDIFF(LEAST(IFNULL(`till`,CURDATE()),'$till'),GREATEST(`from`,'$from')) DIV `freq`) as `scheduled`,
sum(`ontime`),
concat(round(( sum(`ontime`)/SUM(DATEDIFF(LEAST(IFNULL(`till`,CURDATE()),'$till'),GREATEST(`from`,'$from')) DIV `freq`) * 100 ),2),'%') AS ontime_percentage,
sum(`completed`-`ontime`),
concat(round(( sum(`completed`-`ontime`)/SUM(DATEDIFF(LEAST(IFNULL(`till`,CURDATE()),'$till'),GREATEST(`from`,'$from')) DIV `freq`) * 100 ),2),'%') AS completed_percentage,
(SUM(DATEDIFF(LEAST(IFNULL(`till`,CURDATE()),'$till'),GREATEST(`from`,'$from')) DIV `freq`)-sum(`completed`)) as `missed`,
concat(round(( (SUM(DATEDIFF(LEAST(IFNULL(`till`,CURDATE()),'$till'),GREATEST(`from`,'$from')) DIV `freq`)-sum(`completed`)) /SUM(DATEDIFF(LEAST(IFNULL(`till`,CURDATE()),'$till'),GREATEST(`from`,'$from')) DIV `freq`) * 100 ),2),'%') AS missed_percentage from `stores`
left join `jobs` on `stores`.`id`=`jobs`.`client`
left join (select count(`user`) as `completed`,`job_id` from `job_hist` where `job_hist`.`last`>='$from' and `job_hist`.`last`<='$till' group by `job_id`) as `myquery` on `myquery`.`job_id`=`jobs`.`job_id`
left join (select count(`name`) as `ontime`,`job_hist`.`job_id` from `job_hist` left join `jobs` on `jobs`.`job_id`=`job_hist`.`job_id` where `job_hist`.`last`>='$from' and `job_hist`.`last`<='$till' and `job_hist`.`days`<=`jobs`.`freq` group by `job_id`) as `myquery2` on `myquery2`.`job_id`=`jobs`.`job_id`
where `stores`.`owner`='$group'
group by `stores`.`division`
I'd be open to a better way of running this query.