SQL - check if a date is within 14d in another column and return the most recent date

Question

I have a data set that has the following columns: user_id, A_view_dt, A_conversion_dt B_view_dt.

I wanted to check If A_conversion_dt IS NOT NULL, then see if there's another record in B_view_dt that belongs to the same user_id happened 14d prior to A_conversion_dt, if so return the B_view_dt value. Otherwise return NULL under B_view_dt.

IF A_conversion_dt IS NULL, then I'd like to compare A_view_dt and B_view_dt. If a B_view_dt record exists and happens 14d before then keep the most recent max(B_view_dt).

IF B_view_dt IS NULL then I'd like B_view_dt to return NULL.

I have attached a dummy dataset & desired output. I've been trying using UNION ALL and LEFT JOINs but there's always records being missing from the WHERE clause. Anybody has a better solve? Does using partition in this case help?

Thank you!!!

See attachment here

Does `user_id` have only 1 `A_conversion_dt` and `A_view_dt`? This affects the query significantly. — Osa E, Mar 24 '22 at 18:42
@OsaE Hi there! User_id could have multiple A_conversion_dt in the data set. — daniellerobinson7836, Mar 24 '22 at 18:51
@shawnt00 Hey there! Thank you for flagging. User F should have 3/9 as B_view_dt as 3/9 happens 14d before A_conversion_dt and is the most recent date. Fixed the screenshot in my post. — daniellerobinson7836, Mar 24 '22 at 18:53
You need to define which of the `B_view_dt` values is the one that gets returned in each scenario. I also don't find 3/16 among the dates for user C. — shawnt00, Mar 24 '22 at 19:09

shawnt00 · Answer 1 · 2022-03-24T21:48:37.230

select user_id,
    min(A_view_dt) as A_view_dt, min(A_conversion_dt) as A_conversion_dt,
    max(case when
        datediff(day, B_view_dt, coalesce(A_conversion_dt, A_view_dt)) between 0 and 14
            then B_view_dt end) as B_view_dt
from T
group by user_id

It appears safe to make the assumption that all rows (per user) have the same values in the first two date columns so I am running with that in this query.

https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=764c2857cc69190fca3fbd08de3e544c

Or perhaps you want all the date combinations to appear separately?:

select user_id, A_view_dt, A_conversion_dt,
    max(case when
        datediff(day, B_view_dt, coalesce(A_conversion_dt, A_view_dt)) between 0 and 14
            then B_view_dt end) as B_view_dt
from T
group by user_id, A_view_dt, A_conversion_dt

Hi there! Thank you for the answer. It does seem like a user_id could have multiple A_view_dt and A_conversion_dt's. However the combination of (A_view_dt and A_conversion_dt ) is always unique. — daniellerobinson7836, Mar 24 '22 at 19:00

Osa E · Accepted Answer · 2022-03-24T19:56:52.247

I doubt there is an efficient way to run this without some preprocessing especially for large dataset.

Basically you need 2 virtual datasets.

Unique user_id and choice of conversion/view date. Let's call this A_dt. In my result, I added cutoff to help with some debugging and make query simpler. Call this a
Unique user_id and B_View_dt. Call this b

Select all data from a then find max B_View_dt within range of cutoff and A_dt.

Using t-sql, I use CTE to help create virtual tables. See link to test solution - http://sqlfiddle.com/#!18/4db34/5

  ;with a (user_id, A_dt, cutoff) as 
    (   
      select
        user_id,
        isnull(A_Conversion_dt,
        A_view_dt) A_dt,
        dateadd(d,-14, isnull(A_Conversion_dt,A_view_dt)) cutoff   
    from
        t   
    group by
        user_id,
        isnull(A_Conversion_dt,
        A_view_dt) 
    ), 
    b (user_id, B_dt) as 
    (
      select
        user_id,
        B_View_dt B_dt   
    from
        t     
    group by
        user_id,
        B_View_dt 
    ) 
    
    select
        a.*,
        MaxB_dt 
    from
        a 
    outer apply 
    (   
      select
        max(B_dt) MaxB_dt   
      from
            b    
      where
        a.user_id = b.user_id     
        and B_dt between cutoff and A_dt  
    ) ob

Sample Data

Result from query

SQL - check if a date is within 14d in another column and return the most recent date

2 Answers2