3

I have three tables that are all inter-related with the following structure.

ModuleCategory Table:

+------------------+----------------+------------+
| ModuleCategoryID | ModuleCategory | RequireAll |
+------------------+----------------+------------+
|               90 | Cat A          | YES        |
|               91 | Cat B          | NO         |
+------------------+----------------+------------+

ModuleCategorySkill Table:

+------------------+---------+
| ModuleCategoryID | SkillID |
+------------------+---------+
|               90 |    1439 |
|               90 |    3016 |
|               91 |    1440 |
|               91 |    3016 |
+------------------+---------+

EmployeeSkill Table:
+---------+---------+
| EmpName | SkillID |
+---------+---------+
| Emp1    |    1439 |
| Emp1    |    3016 |
| Emp2    |    1440 |
| Emp2    |    3016 |
| Emp3    |    1439 |
| Emp4    |    3016 |
+---------+---------+

Desired Output:

+------------------+-------+
| ModuleCategory   | Count |
+------------------+-------+
|            Cat A |     1 |
|            Cat B |     3 |
+------------------+-------+

I am trying to group by ModuleCategoryID's and get the count of employees which have the skills being tracked.

Normally, I can do the following query to obtain the numbers:

select mc.ModuleCategory, Count(*) as Count from ModuleCategory as mc 
join ModuleCategorySkill as mcs on mc.ModuleCategoryID = mcs.ModuleCategoryID join EmployeeSkill as es on es.SkillID= mcs.SkillID 
group by mc.ModuleCategoryID

However, I have a column RequireAll in the ModuleCategory table which if it is set to 'YES' should only count employees as 1 only if they have all the skills in the category. If it is set to NO then it can count each row normally and increase the count by the number of rows it groups by.

I can achieve this by writing separate queries for each modulecategoryID and using a having Count() > 1 (which will find me anyone that has all the skills for ModuleCategoryID 90). If there were 3 skills than I would have to change it to Having Count() > 2. If there isn't anyone that has all the skills specified, the count should be 0.

I need a dynamic way of being able to do this since there is a lot of data and writing one query for each ModuleCategoryID isn't the proper approach.

Also, I am using PHP so I can loop through and create a sql string that can help me achieve this. But I know I will run into performance issues on big tables with a lot of skills and modulecategoryID's.

Any guidance on how to achieve this is much appreciated.

nomistic
  • 2,902
  • 4
  • 20
  • 36
celik
  • 75
  • 9

1 Answers1

1

You can do it by joining on the total category counts, and then using conditional aggregation:

select modulecategory, 
       count(case when requireall = 'yes'
               then if(s = t, 1, null)
               else s
             end) 
from (             
select modulecategory,empname, requireall, count(*) s, min(q.total) t
  from employeeskill e
    inner join modulecategoryskill mcs
      on e.skillid = mcs.skillid
    inner join modulecategory mc
      on mcs.modulecategoryid = mc.modulecategoryid
    inner join (
      select modulecategoryid, count(*) total
        from modulecategoryskill
        group by modulecategoryid
    ) q
    on mc.modulecategoryid = q.modulecategoryid
  group by modulecategory, empname
  ) qq
group by modulecategory;

demo here

This operates under the assumption an employee isn't going to be allocated the same skill twice, if that is something that may happen, this query is alterable to support it, but it seems like a broken scenario to me.

What we have here is an inner query that collates all the information we need (category name, employee name, whether or not all skills are required, how many skills are in the group per employee, and how many there in the group total), with an outer query that uses a conditional count to change how the rows are tallied, based on the value of requireall.

pala_
  • 8,901
  • 1
  • 15
  • 32
  • Hi pala_, thank you for the quick response and solution! The above query definitely works, however I wanted to check to see if it is the most efficient way because when I run the queries separately and do a union as I have suggested, the query duration is 0.811 secs ( I understand that it won't be as dynamic) and if I do your approach it takes about 6 seconds. Since I have only given the bare minimum columns and joins for demonstration, my main concern would be as I add more logic, the duration will increase. Thanks again for your solution! Looking forward to any other suggestions you may have – celik Jun 09 '15 at 01:16
  • @celik your indexing may not me up to snuff if the above is particularly slow. are you able to provide an `explain` output for the query? – pala_ Jun 09 '15 at 01:57
  • I played around and re-wrote some of the joins that made more sense. I was able to get the same result in 0.171 secs!! Thank you, your logic was definitely the correct answer. – celik Jun 09 '15 at 02:33