1

I have a dataset with these columns:

  • Date
  • Code
  • Link
  • Clicks

Each code represents a person, and my goal is to find the first date someone posted each link and how many clicks each link has. To be considered a legitimate post, a link must have a total of at least 100 clicks. This query ALMOST does what I need:

select 
 min(date) as post_date,
    max(date) as last_click,
 link,
  code,
 sum(clicks) as clicks 
from table1
where date >= '2017-01-01'
group by link, code
having sum(clicks) > 100

The problem is this query is giving me the first date ANY clicks registered for links with 100+ clicks. For example:

PersonA got 1 click for linkA on January 1st. On January 7th linkA reaches a total of 100+ clicks. Right now this query is returning January 1st, when I need it to return January 7th.

I hope I was clear enough (again very new to this). Any help would be GREATLY appreciated! :)

clearlight
  • 12,255
  • 11
  • 57
  • 75
vudupins
  • 39
  • 6
  • See http://meta.stackoverflow.com/questions/333952/why-should-i-provide-an-mcve-for-what-seems-to-me-to-be-a-very-simple-sql-query – Strawberry Jan 27 '17 at 07:49

2 Answers2

0

This is tricky. I think you need to use variables for this, in order to get the cumulative sum of the number of clicks (actually, you could use a correlated subquery as well):

select min(date) as post_date, max(date) as last_click,
       link, code,
       max(running_clicks) as clicks 
from (select t1.*,
             (@c := if(@lc = concat_ws(':', link, code), @c + clicks,
                       if(@lc := concat_ws(':', link, code), clicks, clicks)
                      )
             ) as running_clicks
      from table1 t1 cross join
           (select @rn := 0, @lc := '') params
      where t1.date >= '2017-01-01'
      order by link, code
     ) t
group by link, code
having running_clicks - clicks > 100;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
0

Try this:

select
    min(date) post_date,
    max(date) last_click_date,
    link,
    code,
    max(sum_clicks) clicks_total
from (
    select
        t.*,
        @sum := if(@lastcode = code and @lastlink = link, @sum + clicks,
            if(
                (@lastcode := code) is not null and 
                (@lastlink := link) is not null,
                clicks, clicks
            )
        ) sum_clicks
    from (
        select *
        from table1
        order by code, link, date
    ) t cross join (
        select 
            @lastcode := null,
            @lastlink := null,
            @sum      := 0
    ) t2
) t 
where sum_clicks >= 100
group by link, code;

It uses user variables to cumulatively add the clicks in order of date for each user and filters out those rows where the count is not yet reached.

Gurwinder Singh
  • 38,557
  • 6
  • 51
  • 76