1

I was doing a MySQL question on Leetcode.(link: https://leetcode.com/problems/get-highest-answer-rate-question/) The question is to find the maximum. I used order by + limit 1 to get the answer. But what if there are multiple maximum? Limit 1 will only return 1 answer.

I tried to use dense_rank() to solve the problem, but I found the outputs are different when I use partition by and group by.

Input: {"headers": {"survey_log": ["id", "action", "question_id", "answer_id", "q_num", "timestamp"]},"rows": {"survey_log": [[5, "show", 285, null, 1, 123], [5, "answer", 285, 124124, 1, 124], [5, "show", 369, null, 2, 125], [5, "skip", 369, null, 2, 126]]}}

Input Image

If my code is:

# Method 1
select question_id, 
dense_rank() over (order by count(case when action = 'answer' then 1 end)/
                            count(case when action = 'show' then 1 end) desc) as num
from survey_log
group by question_id

Then I got output:

Output: {"headers": ["question_id", "num"], "values": [[285, 1], [369, 2]]}

However, when I tried to use partition by to achieve the same effects, the output is not what I want:

# Method 2
select question_id, 
dense_rank() over (partition by question_id 
                   order by count(case when action = 'answer' then 1 end)/
                            count(case when action = 'show' then 1 end) desc) as num
from survey_log
Output: {"headers": ["question_id", "num"], "values": [[285, 1]]}

I don't know why the outputs here are different. Can anyone explain? Thanks in advance!!


Update: I'm sorry that I didn't state the question clearly. The question is to "write a sql query to identify the question which has the highest answer rate."

"The highest answer rate meaning is: answer number's ratio in show number in the same question."

As for the input above, question 285 has answer rate 1/1, while question 369 has 0/1 answer rate, so output 285. Then the output should be: 285 Output

My confusion is why the output of method 2 is different from method 1? Thanks!!

GMB
  • 216,147
  • 25
  • 84
  • 135
Squirrel K
  • 25
  • 3
  • 1
    What is the problem you are trying to solve? I can't access the leetcode link, since it requires creating an account. You would need to explain what you mean by *The question is to find the maximum.*, and show the result that you want for your sample data. – GMB Aug 18 '20 at 20:59
  • 1
    @Emma Thanks for your reply! But your answer won't work if there are multiple maximum values. – Squirrel K Aug 19 '20 at 02:54
  • 1
    @GMB Thanks for your reply! I just clarify my question. Sorry for the confusion. – Squirrel K Aug 19 '20 at 02:55

1 Answers1

0

I would start with a query that computes the answer rate for each question. From your problem statement, that should be:

select
    question_id,
    sum(action = 'answer') / nullif(sum(action = 'show'), 0) answer_rate
from survey_log
group by question_id

You can use that information to rank the questions. You want to rank each question against all other groups, so the window function should not have a partition clause:

select
    question_id,
    rank() over(order by sum(action = 'answer') / nullif(sum(action = 'show'), 0) desc) rn
from survey_log
group by question_id
order by rn
GMB
  • 216,147
  • 25
  • 84
  • 135
  • Thanks for reply! But my main confusion is why the output of method 2 is different from method 1? I still cannot understand. – Squirrel K Aug 20 '20 at 02:14