1

I'm currently trying to figure out how I can conditionally insert a row based on a previous row. I'm fairly used to using window functions, and I figure I'll have to do so to make this work, but I don't know any other functions to make this work.

The dataset I'm working with would look like this

Original Table

And what I would want it to look like would be this:

After Modification

So, in effect, I'm looking to add the gap that exists between two row dates. If the end date of one row has a gap between it and the start date of the next row, I would want to be able to insert a row between them that has the same item and store with the in-between dates and a sold amount of 0.

I am trying to perform this in Google BigQuery console.

GMB
  • 216,147
  • 25
  • 84
  • 135
OrangeDave
  • 27
  • 5
  • are you really using BigQuery? You accepted answer that has no chances to run in BigQuery and thus can mislead other users who will look for similar use case! – Mikhail Berlyant Aug 13 '20 at 21:41
  • I really AM using BigQuery. And I have to ask what you thought I might have to gain by asking a question with a need, and then accepting an answer without it actually meeting my need? No, it didn't work when I used dateadd, but it did work when I used date_add. Other than that, it absolutely worked for what I needed, and I'm using it right now. I came here to ask another question that is built on top of my using this. Which is how I discovered your comment. – OrangeDave Aug 27 '20 at 18:13

3 Answers3

2

You can use union all and lead():

select item, store, start, end, sold
from t
union all
select item, store, dateadd(end, interval 1 day), dateadd(next_start, interval -1 day)
from (select item, store, end, lead(start) over (partition item, store start) as next_start
      from t
     ) t
where next_start  dateadd(end, interval 1 day);
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • 1
    Thanks for this. I don't normally have to do this type of function, but this reminded me of using UNION ALL for this type of thing. I appreciate it. I'll have to look at getting your book! – OrangeDave Aug 13 '20 at 21:35
1

You can use window functions and the insert ... select syntax to create the "missing" rows:

insert into mytable (item, store, start, end, sold)
select 
    item,
    store, 
    dateadd(end, interval 1 day),
    dateadd(lead_start, interval -1 day),
     0
from (
    select 
        t.*,
        lead(start) over(partition by item, store order by start) lead_start
    from mytable t
) t
where lead_start > dateadd(end, interval 1 day)
GMB
  • 216,147
  • 25
  • 84
  • 135
  • I didn't use this one just because I went with the first answer in the attempt to solve my problem, but thinking this through, I feel this would have worked just as well for my problem. Didn't even think to do it this way, and it uses the same functions. 6 of one, half a dozen of the other. – OrangeDave Aug 13 '20 at 21:36
1

Below is for BigQuery Standard SQL

#standardSQL
SELECT * FROM `project.dataset.table` UNION ALL
SELECT * FROM (
  SELECT item, store,
    DATE_ADD(`end`, INTERVAL 1 DAY) new_start,
    DATE_SUB(LEAD(start) OVER(PARTITION BY item, store ORDER BY start), INTERVAL 1 DAY) new_end, 0
  FROM `project.dataset.table` 
)
WHERE new_start <= new_end

if to apply to sample data from your question - the output is

Row item    store   start       end         sold     
1   5       1       2020-01-01  2020-01-15  22   
2   5       1       2020-01-16  2020-01-31  0    
3   5       1       2020-02-01  2020-02-20  14      

In case if you want to insert those "missing" rows into your table - you should use only second select - which is

INSERT INTO `project.dataset.table`
SELECT * FROM (
  SELECT item, store,
    DATE_ADD(`end`, INTERVAL 1 DAY) new_start,
    DATE_SUB(LEAD(start) OVER(PARTITION BY item, store ORDER BY start), INTERVAL 1 DAY) new_end, 0
  FROM `project.dataset.table` 
)
WHERE new_start <= new_end 
Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230
  • Same as my comment on the next answer; I can see how this would have worked, too, but I chose the answer as the one I selected to try first (which was the first response). Thanks for leaving an answer, too! – OrangeDave Aug 13 '20 at 21:37
  • it is not about who first! it is more which answer correct and out of those which is best or better than others! but at least correct! obviously this is up to you what answer to accept - but you are misleading other users who will be looking for similar use cases – Mikhail Berlyant Aug 13 '20 at 21:39
  • Well, I suppose I'm not qualified to answer what is best. What I meant with my comment was that I see how yours is correct, but the first answer was also correct. They both would have solved my problem. So, I'll take input as to how I should maybe decide what is best. Given my novice level of SQL, I don't see how I would be able to say which of the answers given herein are better than another. – OrangeDave Aug 27 '20 at 19:49
  • sure. don't worry :o) you did all correct :o) it is really up to you what to accept and vote up, etc. – Mikhail Berlyant Aug 27 '20 at 20:04