How to avoid groups but require a minimum count?

Question

I have answered and read many question on getting the greatest-n-per-group but now find myself needing the opposite.

I have a result set that shows students, date, and project that represent which students worked on a project on a given day.

I would like to see rows where multiple students worked on a project for that day. So if my result set looks like this:

| student |    date    | project |
+---------+------------+---------+
|    1    | 2014-12-04 |    1    |
|    2    | 2014-12-04 |    1    |
|    3    | 2014-12-04 |    1    |
|    1    | 2014-12-03 |    1    |

I would only like to see the first three rows, so I can see that students 1,2,3 worked together on the same project on the same day. I could filter like this:

GROUP BY date, project
HAVING COUNT(*) > 1

But then only one row will be returned.

So you want to see all the students, dates, and projects who have worked on a project so long as more than 1 student worked on it? — xQbert, Dec 04 '14 at 18:48
@xQbert Yes, but also on the same day. If student 1 worked on it on 12/03, and student 2 worked on it on 12/02, I don't want to see that because there was only one student on that particular day. — AdamMc331, Dec 04 '14 at 18:49

score 2 · Accepted Answer · answered Dec 04 '14 at 18:49

2

you can use your existing query as subquery and get the results

SQL FIDDLE DEMO

SELECT * from Table1 T1
JOIN
(
  SELECT date, project
  from table1
  group by date, project
  having count(*) >1
) t
on t1.date = t.date
and t1.project = t.project

answered Dec 04 '14 at 18:49

radar

13,270
2
25
33

Can you see any particular reason why this would be inefficient? I mean, I'm dealing with a small database by comparison (We have less than 1000 projects, less than 200 students, and I don't expect date to grow much as I will filter on recent projects). I only hesitate at the idea of a sort of `self join`. – AdamMc331 Dec 04 '14 at 18:52
@McAdam331, we need a self join , as long as index exists on date, project columns, it should be good. – radar Dec 04 '14 at 18:54
Yeah, I trust our DBA to design this well. Everything is indexed I used shorthand to get my point across, but I'm already joining about 4 DB tables just to get person, date, and project. – AdamMc331 Dec 04 '14 at 18:55
1

Do not worry about response times of queries until someone complains. ;-/ – Ryan Vincent Dec 04 '14 at 18:55
@RyanVincent Do not worry about response times so long as they take less than 1-3 seconds now. If it takes 10 seconds to run and you give that to the user, you're likely doing something wrong. There are some exceptions like overnight process or canned reports or what not where performance may not be as critical, but if this SQL interfaces directly with users in an app, their expectation is generally < 1 second. So if that's the case, the "Somone" should be the person writing the SQL, not the end user. – xQbert Dec 04 '14 at 19:04
In regards to response times, running this on our test database ran extremely quickly, less than a second. Hopefully production will be the same, which I expect. – AdamMc331 Dec 04 '14 at 19:05

score 1 · Answer 2 · answered Dec 04 '14 at 18:53

1

This should work.

I think of the table as two sets of data and join them based on date and project and not the same student.

This way if any records exist after the join, we know that they have the same project and date but not for the same student. Group the results ... and you have what you're after.

SELECT A.student, A.date, A.project
from table a
INNER JOIN table b
on A.date=B.Date 
and A.Project=B.Project
and a.student<> b.student
group by A.student, a.date, a.project

answered Dec 04 '14 at 18:53

xQbert

34,733
2
41
62

This is not a bad idea, but It would be a bit more difficult for me to implement. `student` and `project` are aliases for my real world database, which has 4 joins to get from student to date to project. – AdamMc331 Dec 04 '14 at 18:55

How to avoid groups but require a minimum count?

2 Answers2

Linked