0

I have the following table, which has the records of the answers to some academic quizzes:

Respostas

-SEMESTRE_ID (foreign key for semester of the quiz)

-PERGUNTA_ID (foreign key for the question of the quiz)

-DISCIPLINA_ID (foreign key for the subject being evaluated)

-DOCENTE_ID (foreign key for the teacher being evaluated)

-QUIZ_ID (each person answering a quiz has an unique quiz_id)

-TIPOAULA (type of class, doesn't matter for this context)

-DATA (date, doesn't matter for this context)

-ID (unique id of an answer)

-RESPOSTA (value of the answer, from 1 to 5)

So, a quizz of a given semester has multiples questions of different subjects, which can be evaluated by people (don't need to necessarily vote on all the questions).

The query I want to achieve is:

What's the id and acronym of the subjects that had at least a 5 in each one of the questions of the quiz of semester of id 21?

I've solved it using the counting strategy:

SELECT DISCIPLINA.DISCIPLINA_ID, SIGLA
FROM RESPOSTAS, DISCIPLINA
WHERE RESPOSTAS.DISCIPLINA_ID = DISCIPLINA.DISCIPLINA_ID AND SEMESTRE_ID = 21 AND RESPOSTA = 5
GROUP BY DISCIPLINA.DISCIPLINA_ID, SIGLA
HAVING COUNT(DISTINCT PERGUNTA_ID) = (SELECT COUNT(DISTINCT PERGUNTA_ID) FROM RESPOSTAS WHERE SEMESTRE_ID = 21);

But this is for an assignment and we're asked to also solve it by using the double negation strategy, which I can't seem to understand how to achieve the expected result.

Hugo Sousa
  • 1,904
  • 2
  • 15
  • 28
  • possible duplicate? http://stackoverflow.com/questions/15681935/double-negation-sql-query – Joel Gregory Mar 20 '15 at 00:44
  • @JoelGregory True story, saw that, but unfortunately there's no solution for it (using double negation, at least). Seems the assignments still the same, and I'm not the first one with difficulties :) Thanks anyway. – Hugo Sousa Mar 20 '15 at 00:46
  • Can you define the "double negation strategy"? That's not a term I've heard before. You could turn `semestre_id = 21` into `not( semestre_id != 21)`-- that's double negation but that's probably not what you're being asked for. – Justin Cave Mar 20 '15 at 00:54
  • @JustinCave I actually found it weird that Google didn't know much about that, too xD It's using `NOT IN` twice in the query. – Hugo Sousa Mar 20 '15 at 00:56

1 Answers1

3

This looks like a relational division query (if I'm not mistaken) and an example of the double negation that you're looking for is the query below. Basically it can be read as show me all subjects for which there exists no question not also in the set of question with a score of five or more

SELECT DISCIPLINA_ID, SIGLA 
FROM XDISCIPLINA XD
WHERE NOT EXISTS ( -- there can not be any questions ...
    SELECT 1 FROM XRESPOSTAS XR
    WHERE SEMESTRE_ID = 21 
    AND NOT EXISTS ( -- that are not in the set of 5+
       SELECT 1 FROM XRESPOSTAS 
       WHERE PERGUNTA_ID = XR.PERGUNTA_ID 
         AND SEMESTRE_ID = 21
         AND RESPOSTA >= 5
         AND XD.DISCIPLINA_ID = DISCIPLINA_ID
    )
)

An good article on this is: Divided We Stand: The SQL of Relational Division by Joe Celko. It's well worth reading.

EDIT:

I think the query above was performing really bad due to a missing join. This version should perform a lot better:

SELECT DISCIPLINA_ID, SIGLA 
FROM XDISCIPLINA XD
WHERE SEMESTRE_ID = 21 
  AND NOT EXISTS (
    SELECT 1 FROM XRESPOSTAS XR
    WHERE SEMESTRE_ID = 21 AND
    XD.DISCIPLINA_ID = DISCIPLINA_ID AND
      NOT EXISTS (
       SELECT 1 FROM XRESPOSTAS 
       WHERE PERGUNTA_ID = XR.PERGUNTA_ID 
         AND SEMESTRE_ID = 21
         AND RESPOSTA > 4
         AND XD.DISCIPLINA_ID = DISCIPLINA_ID
    )
)

Another way to achieve the same result that seem to perform a whole lot better:

SELECT DISCIPLINA_ID, SIGLA 
FROM XDISCIPLINA XD
WHERE NOT EXISTS (
  SELECT PERGUNTA_ID FROM XRESPOSTAS XR 
    WHERE SEMESTRE_ID = 21 AND XD.DISCIPLINA_ID = DISCIPLINA_ID 
  INTERSECT
  SELECT PERGUNTA_ID FROM XRESPOSTAS XR 
    WHERE SEMESTRE_ID = 21 AND XD.DISCIPLINA_ID = DISCIPLINA_ID AND RESPOSTA < 5
    )
jpw
  • 44,361
  • 6
  • 66
  • 86
  • For some reason, this query seems to be infinite or something, it doesn't stop executing. I'll give a better look at it tomorrow. – Hugo Sousa Mar 20 '15 at 01:39
  • @HugoSousa I didn't test it as I didn't have time to set up suitable test data so it's possible there is some error in it - I tend to get confused when I do this type of query as it's pretty unintuitive. If you add some test data to your question I can test it. – jpw Mar 20 '15 at 01:44
  • It actually seems to be working (http://sqlfiddle.com/#!4/5fad4/2), but in my environment it doesn't stop executing. This is weird... – Hugo Sousa Mar 20 '15 at 02:06
  • Any idea what it might be? My guess was that it could be a huge performance problem, because there's much more data than in the example. I let the query running for more than an hour without results. – Hugo Sousa Mar 20 '15 at 21:27
  • @HugoSousa I don't know. I guess the double negation isn't the best performing construct - how much data are you running the query on? – jpw Mar 20 '15 at 23:26
  • 2.7 million rows on `Respostas` and ~1500 rows on `Disciplina`. I guess it isn't that much actually, but I can't understand what's wrong :/ If I remove the last statement (`AND XD.DISCIPLINA_ID = DISCIPLINA_ID`), the query runs fast (but wrong result, obviously). – Hugo Sousa Mar 20 '15 at 23:29
  • [UPDATE] Without the statement I said, the execution plan cost is 5466. The complete query execution plan cost is 24428669293. I guess that means a loooooot of time. – Hugo Sousa Mar 20 '15 at 23:39
  • @HugoSousa Yeah, that's gonna take some time :o maybe you're missing some index? – jpw Mar 20 '15 at 23:41
  • 1
    The goal of this assignment is actually to analyze the execution plans, but I can't even be sure this query will provide the correct result. This just seems wrong somewhere! Or maybe this is the exact expected behaviour, which I find interestingly weird. Anyway, take my +1. I have same tables with and without indexes, and tried on both (index on `RESPOSTA`, `SEMESTRE_ID`, `DISCIPLINA_ID` and `PERGUNTA_ID` columns). – Hugo Sousa Mar 20 '15 at 23:43
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/73452/discussion-between-hugo-sousa-and-jpw). – Hugo Sousa Mar 20 '15 at 23:47
  • @HugoSousa I looked it over and made a change that should make it perform a lot better. I also added another version (not a double neg though). – jpw Mar 21 '15 at 01:16
  • @HugoSousa Just curious if you had a chance to try the last version I added? Would be nice to know if it worked. – jpw Mar 31 '15 at 23:50
  • There are some details in the chat I guess you didn't see. The "double negation" way was a bit wrong, but with the fix, and adding `DISTINCT` clause made it pretty slow anyway (above 10 hours if I remember), and it returns 3 additional rows, comparing to my initial version on the OP. The "non double negation" version is clearly wrong, returning 1237 rows, but I didn't take a deep look at it. – Hugo Sousa Apr 02 '15 at 19:15