-1

Find total number of unique hackers who made at least submission each day (starting on the first day of the contest), and find the hacker_id and name of the hacker who made maximum number of submissions each day. If more than one such hacker has a maximum number of submissions, print the lowest hacker_id. The query should print this information for each day of the contest, sorted by the date.

Here is the sample data: Hackers table:

15758   Rose
20703   Angela
36396   Frank
38289   Patrick
44065   Lisa
53473   Kimberly
62529   Bonnie
79722   Michael

Submissions table:
Submission_date submission_id hacker_id score
3/1/2016    8494    20703   0
3/1/2016    22403   53473   15
3/1/2016    23965   79722   60
3/1/2016    30173   36396   70
3/2/2016    34928   20703   0
3/2/2016    38740   15758   60
3/2/2016    42769   79722   25
3/2/2016    44364   79722   60
3/3/2016    45440   20703   0
3/3/2016    49050   36396   70
3/3/2016    50273   79722   5
3/4/2016    50344   20703   0
3/4/2016    51360   44065   90
3/4/2016    54404   53473   65
3/4/2016    61533   79722   45
3/5/2016    72852   20703   0
3/5/2016    74546   38289   0
3/5/2016    76487   62529   0
3/5/2016    82439   36396   10
3/5/2016    90006   36396   40
3/6/2016    90404   20703   0 

for the above data, expected results is:
2016-03-01 4 20703 Angela
2016-03-02 2 79722 Michael
2016-03-03 2 20703 Angela
2016-03-04 2 20703 Angela
2016-03-05 1 36396 Frank
2016-03-06 1 20703 Angela

My below query doesnt give me unique hacker_ids

select submission_date, cnt, hacker_id, name from 
(select s.submission_date
, count(s.hacker_id) over(partition by s.submission_date) cnt
, row_number() over(partition by s.submission_date order by s.hacker_id asc) rn
, s.hacker_id, h.name from submissions s
inner join hackers h on h.hacker_id = s.hacker_id) as tble
where tble.rn = 1;

How do I get the unique hacker_ids in the above results ?

David דודו Markovitz
  • 42,900
  • 6
  • 64
  • 88
  • what about::> select distinct s.hackerid from submissions s inner join hackers h on h.hacker_id = s.hacker_id) as tble where tble.rn = 1; – Eray Balkanli Aug 09 '19 at 15:20
  • This post should have been closed. I understand what you were trying to achieve, however the description of the scenario is erroneous (might be a language barrier). You seem to look for hackers who submitted on the 1st day of the contest. The expected result also seems to be erroneous, assuming the 2nd column represents the number of submissions (you didn't bother to supply column headers). The numbers there do not match the raw data. – David דודו Markovitz Aug 20 '22 at 20:18

4 Answers4

0

For MS SQL

with MaxSubEachDay as (
    select submission_date,
           hacker_id,
           RANK() OVER(partition by submission_date order by SubCount desc, hacker_id) as Rn
    FROM
    (select submission_date, hacker_id, count(1) as SubCount 
     from submissions
     group by submission_date, hacker_id
     ) subQuery
), DayWiseRank as (
    select submission_date,
           hacker_id,
           DENSE_RANK() OVER(order by submission_date) as dayRn
    from submissions
), HackerCntTillDate as (
select outtr.submission_date,
       outtr.hacker_id,
       case when outtr.submission_date='2016-03-01' then 1
            else 1+(select count(distinct a.submission_date)                         from submissions a where a.hacker_id = outtr.hacker_id and                              a.submission_date<outtr.submission_date)
        end as PrevCnt,
        outtr.dayRn
from DayWiseRank outtr
), HackerSubEachDay as (
    select submission_date,
    count(distinct hacker_id) HackerCnt
from HackerCntTillDate
  where PrevCnt = dayRn
group by submission_date
)
select HackerSubEachDay.submission_date,
       HackerSubEachDay.HackerCnt,
       MaxSubEachDay.hacker_id,
       Hackers.name
from HackerSubEachDay
inner join MaxSubEachDay
 on HackerSubEachDay.submission_date = MaxSubEachDay.submission_date
inner join Hackers
 on Hackers.hacker_id = MaxSubEachDay.hacker_id
where MaxSubEachDay.Rn=1
MD TAREK HOSSEN
  • 129
  • 2
  • 11
-1

You can use two levels of aggregation:

select s.submission_date, count(*) as num_hackers, sum(cnt) as num_hacks,
       max(case when seqnum = 1 then h.hacker_id end) as hacker_id,
       max(case when seqnum = 1 then h.name end) as name,
from (select s.submission_date, s.hacker_id, count(*) as cnt
             row_number() over(partition by s.submission_date order by count(*) desc) as seqnum
      from submissions s
      group by s.submission_date, s.hacker_id
     ) s join
     hackers h
     on h.hacker_id = s.hacker_id
group by s.submission_date;

Note that the subquery is aggregating by the date and hacker_id, so there is one row per hacker_id on each date. The count(*) in the outer query is counting these rows, which is the number of hackers. I included the count for the number of hacks.

EDIT:

I realize that you can do an additional analytic function in the subquery and that will simplify the logic a bit:

select s.submission_date, s.num_hackers, num_hacks,
       h.hacker_id, h.name
from (select s.submission_date, s.hacker_id, count(*) as cnt,
             sum(count(*)) over (partition by s.submission_date) as num_hacks,
             count(*) over (partition by s.submission_date) as num_hackers,
             row_number() over(partition by s.submission_date order by count(*) desc) as seqnum
      from submissions s
      group by s.submission_date, s.hacker_id
     ) s join
     hackers h
     on h.hacker_id = s.hacker_id
where seqnum = 1;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • total number of unique hackers between March 2nd to 5th are 2, as only `20703 & 79722` were consistent from day 1 till March 5th. your query is giving me total count as `3 for Mar 2nd & 3rd, and 4 as for Mar 4th &5th` –  Aug 09 '19 at 15:28
  • @gabs . . . Set up a db<>fiddle, but I see 3 on March 2nd (20703, 79722, 15758), so "3" looks right. – Gordon Linoff Aug 09 '19 at 15:30
  • trying to set up db fiddle, however the unique hackers in this case means the only hackers who have been submitting from day 1. even though (20703, 79722, 15758) had submitted on March 2nd, only 20703, 79722 had submitted from March 1st. –  Aug 09 '19 at 15:36
  • I am new to fiddle, not sure about formatting the date type. Please advise https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=4ef11742f1971e454712c8754025bcc7 –  Aug 09 '19 at 16:09
  • @gabs . . . I am lost on what you are trying to explain. I think this answers the question that you actually asked (unique hackers on each day). It doesn't seem to do what you want. Could you ask another question with a clearer explanation? – Gordon Linoff Aug 10 '19 at 01:21
  • my wordings might be confusing but the definition of Unique hackers per day (in this case) is the hackers who have been submitting at least 1 submission since day one. –  Aug 10 '19 at 01:37
-1
select big_1.submission_date, big_1.hkr_cnt, big_2.hacker_id, h.name
from
(select submission_date, count(distinct hacker_id) as hkr_cnt
from 
(select s.*
, dense_rank() over(order by submission_date) as date_rank
--, row_number() over(order by submission_date) as rn_date_rank
,dense_rank() over(partition by hacker_id order by submission_date) as hacker_rank 
--,row_number() over(partition by hacker_id order by submission_date) as rn_hacker_rank 
from submissions s ) a 
where a.date_rank = a.hacker_rank 
group by submission_date) big_1 
join 
(select submission_date,hacker_id, 
 rank() over(partition by submission_date order by sub_cnt desc, hacker_id) as max_rank 
from (select submission_date, hacker_id, count(*) as sub_cnt 
      from submissions 
      group by submission_date, hacker_id) b ) big_2
on big_1.submission_date = big_2.submission_date and big_2.max_rank = 1 
join hackers h on h.hacker_id = big_2.hacker_id 
order by 1 ;
-1
select tt.submission_date,tt.hacker_count,ts.hacker_id,ts.name
from
(select t2.submission_date,count(t2.hacker_rank) as hacker_count from
(
select submission_date,count(distinct(hacker_id)) as hacker_count,
dense_rank() over(order by submission_date) as date_rank,
dense_rank() over(partition by hacker_id order by submission_date) as 
hacker_rank 
from submissions
group by submission_date,hacker_id
) as t2
where t2.hacker_rank = t2.date_rank
group by submission_date
) as tt
join (
select t1.submission_date,t1.hacker_id,t1.name
from (
select s.submission_date,count(s.hacker_id) as 
count_hacker_id,s.hacker_id,h.name,
ROW_NUMBER() over(PARTITION BY s.submission_date order by count(*) desc) 
as seqnum
from submissions s
left join hackers h 
on h.hacker_id = s.hacker_id
group by s.submission_date,s.hacker_id,h.name
) as t1
where t1.seqnum = 1 ) as ts on ts.submission_date = tt.submission_date;
  • 1
    Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Jan 29 '22 at 15:46