Data has to be partitioned by id as well as by pageview_date. So for each corresponding id - code should look for the latest date available in the column edited_date
that is no later than pageview_date
field itself. But it has to look for all values that are available before the pageview_date
NOT ONLY for what the records are for each given day.
Here is data and the code:
with sample as (
select 'a' as id, DATE('2022-02-27') as pageview_date, DATE('2022-01-28') as edited_date
UNION ALL
select 'a' as id, DATE('2022-02-27') as pageview_date, DATE('2022-03-01') as edited_date
UNION ALL
select 'a' as id, DATE('2022-03-01') as pageview_date, DATE('2022-03-28') as edited_date
UNION ALL
select 'a' as id, DATE('2022-03-01') as pageview_date, DATE('2022-01-28') as edited_date
UNION ALL
select 'a' as id, DATE('2022-03-05') as pageview_date, DATE('2017-02-28') as edited_date
)
SELECT
id,
pageview_date,
MAX(IF(edited_date <= pageview_date, edited_date, null)) OVER (PARTITION BY pageview_date, id) as new_edited_date
FROM sample
Desired output is:
id pageview_date new_edited_date
a 2022-02-27 2022-01-28
a 2022-02-27 2022-01-28
a 2022-03-01 2022-03-01
a 2022-03-01 2022-03-01
a 2022-03-05 2022-03-01