-1

I have two tables, one table is a list of purchases with revenue, purchase_time and a user id, the other table has a list of a list of campaign clicks with campaign_id, user_id, click_time. campaign_clicks essentially logs all clicks from a campaign, there can any number of clicks or none and these could happen anytime, before or after a purchase but what I need to do is determine which campaign_id was the last campaign to be clicked on before the purchase was made by any given user and what was the total revenue attributed to that campaign_id. I want to only attribute revenue to clicks that occurred within 3 days prior to the purchase.

purchases

date user_id revenue purchase_time
2020/09/01 10 30.0 2020/09/01 10:10:00 am
2020/09/01 20 15.0 2020/09/02 09:15:00 am
2020/09/01 30 25.0 2020/09/02 08:15:00 am

campaign_clicks

user_id campaign_id click_time
10 2 2020/09/01 10:01:00 am
10 1 2020/09/01 10:05:00 am
10 2 2020/09/01 10:20:00 am
20 2 2020/09/01 10:10:00 am
30 2 2020/09/01 07:30:00 am

desired result

date campaign_id revenue
2020/09/01 1 30.0
2020/09/01 2 25.0

purchase from user id 20 shouldn't be included because it occurred before the click_time. User 10 revenue should be attributed to campaign 2 because the click occurred just before the purchase.

My problem is the join I have is returning all the clicks which is inflating the revenue. The select in the inner join isn’t aware of the purchase time, I need to somehow filter and narrow down the clicks to a single click, the last click. I've tried using ROW_NUMBER() to apply an index but that doesn't allow me to filter out clicks that occur after the purchase.

This is where I’m at

SELECT  
  date
  ,ROUND(sum(revenue)) as revenue
  ,campaign_clicks.campaign_id
FROM 
    purchases                    
       
        LEFT JOIN ( 

                   SELECT                                   
                        campaign_id 
                        ,user_id
                        ,click_time                       
                   FROM 
                      campaign_clicks            
                   ORDER BY         
                      click_time DESC                             
                  ) AS clicks ON clicks.user_id = purchases.user_id 
WHERE 
  -- only select campaign clicks that occurred before the purchase                
  purchases.purchase_time > clicks.click_time

  -- only include clicks that occurred within 3 days of the purchase               
  AND DATEDIFF(minutes, clicks.click_time,purchases.purchase_time) <= (60*24*3)

  -- PROBLEM HERE - there can be still a number of other clicks that occurred before the purchase I need to filter to only the last one 
GROUP BY 
   date
 ,clicks.campaign_id
andrew_81
  • 3
  • 2
  • Please in code questions give a [mre]--cut & paste & runnable code; example input with desired & actual output (including verbatim error messages); tags & versions; clear specification & explanation. For errors that includes the least code you can give that is code that you show is OK extended by code that you show is not OK. (Debugging fundamental.) For SQL include DDL & tabular initialization code. When you get a result you don't expect, pause your overall goal, chop to the 1st subexpression with unexpected result & say what you expected & why, justified by documentation. [ask] [Help] – philipxy Sep 28 '21 at 02:19
  • An order by without top/limit that is nested has no effect. Tables have no order; result sets [sic] have partial order per an order by. – philipxy Sep 28 '21 at 02:22
  • LEFT JOIN returns INNER JOIN rows UNION ALL unmatched left table rows extended by NULLs. Always know what INNER JOIN you want as part of an OUTER JOIN. After a LEFT JOIN a WHERE, INNER JOIN or HAVING that requires a right [sic] table column to be not NULL removes any rows with introduced NULLs, ie leaves only INNER JOIN rows, ie "turns OUTER JOIN into INNER JOIN". You have that. – philipxy Sep 28 '21 at 02:25
  • 1
    Could you edit and add expected result table from these 2 tables? So we know what you are trying to do. – Nguyễn Văn Quyền Sep 28 '21 at 02:38
  • "cut & paste & runnable code" etc please: [mre]. PS See [How do comment replies work?](https://meta.stackexchange.com/q/43019/266284) to learn to use `@x` to notify one non-sole non-poster commenter `x` re a comment. Posters, sole commenters & followers of posts always get notified. – philipxy Sep 28 '21 at 03:56

2 Answers2

1

Well you can achieve this using the following query. So basically, you can perform a INNER JOIN and filter out dates where the duration crosses 3 days within the ON clause itself.

Now coming to limiting to the last clicked campaign, it can be achieved using ROW_NUMBER function and setting the order of sequence to clicked_time DESC. This way the last clicked date before purchase will have a sequence no. of 1. You can then just filter out the records where row_number is greater than 1 by wrapping the result set in a outer query.

-- Outer query to select just the last click for a any given purchase
SELECT * FROM (
    SELECT p.date, p.purchase_time, c.click_time, c.campaign_id, p.revenue,
-- sequential row number for clicks sorted in descending order of date
    ROW_NUMBER() OVER(PARTITION BY c.user_id ORDER BY c.click_time DESC) AS row_num
    FROM purchases p
    INNER JOIN campaign_clicks c
    ON ( 
       c.user_id = p.user_id 
      --- only select clicks that occured before the purchase
      AND c.click_time<p.purchase_time
      --- only select the clicks that occurred 3 days prior (mins * hours * days )
      AND TIMESTAMPDIFF(MINUTE, c.click_time, p.purchase_time) <= (60*24*3)
    )
) res WHERE res.row_num=1
You can also check the results on the DB-Fiddle link
Salvino D'sa
  • 4,018
  • 1
  • 7
  • 19
  • Restating code in natural language adds nothing & doesn't constitute an explanation of why that code does what is wanted. (Bits of your text explain.) (I'm done.) – philipxy Sep 28 '21 at 05:57
0

Snowflake supports joining laterally. That is, on to a function or correlated sub-query. This allows you to join on to a query that returns just one row (per input row).

SELECT  
  purchases.date
 ,purchases.revenue
 ,clicks.campaign_id
FROM 
  purchases    
LEFT JOIN LATERAL
(
  SELECT
    campaign_id 
   ,user_id
   ,click_time                       
  FROM 
    campaign_clicks
  WHERE
            user_id = purchases.user_id
    -- only select campaign clicks that occurred before the purchase                
    AND click_time <  purchases.purchase_time
    -- only include clicks that occurred within 3 days of the purchase               
    AND click_time >= DATEADD(days, -3, purchases.purchase_time)
  ORDER BY
    click_time DESC
  LIMIT
    1                        
)
  AS clicks
halfer
  • 19,824
  • 17
  • 99
  • 186
MatBailie
  • 83,401
  • 18
  • 103
  • 137