Dynamic way of counting number of previous rows in a table that satisfy some condition, over certain columns in the table

Question

As a specific example say I have a table T with columns customer and date indicating days on which individual customers have made purchases:

customer |   date   
----------------------  
       A | 01/01/2013 
       A | 02/01/2013
       A | 07/01/2013
       A | 11/01/2013
       B | 03/01/2013
       B | 08/01/2013

I want to add another column that for each pair (customer, date) pair (c, d), gives the number of pairs (c', d') in T such that c = c' and 0 <= days(d) - days(d') <= 7. Below is the table with this extra column:

customer |   date     | new_column
----------------------------------  
       A | 01/01/2013 |          1
       A | 02/01/2013 |          2
       A | 07/01/2013 |          3 
       A | 11/01/2013 |          2
       B | 03/01/2013 |          1
       B | 10/01/2013 |          1

As a rough idea of the steps I used to solve this problem:

create a table T' with all possible pairs (c,d);
left join T onto T';
create a new column: count(date) over (partition by customer order by date asc rows between 6 preceding and 0 following);
omit any rows from this new table where T.date is null

However, I don't think this is scalable.

Cheers for any help.

You can get the information with a query that has a self join. Why do you need a new column in your table? — Dan Bracuk, May 14 '13 at 16:44

score 0 · Answer 1 · edited May 23 '17 at 12:03

Let's start with some DDL. (You'll get more answers and better answers if you include DDL and sample INSERT statements in your questions.)

create table test (
  customer char(1) not null,
  purchase_date date not null,
  primary key (customer, purchase_date)
);

insert into test values
('A', '2013-01-01'),
('A', '2013-01-02'),
('A', '2013-01-07'),
('A', '2013-01-11'),
('B', '2013-01-03'),
('B', '2013-01-10');

In standard SQL, you can use something along these lines. It doesn't require creating another table, outer joins, or window functions. It's not clear whether you have a good reason for wanting to create a new table, but it's not necessary to get the right data. (I renamed the "date" column to avoid a reserved word.)

select t1.customer, t1.purchase_date, count(*) new_column
from test t1
inner join test t2 on t1.customer = t2.customer
and t2.purchase_date <= t1.purchase_date and t2.purchase_date > t1.purchase_date - interval '7 day'
group by t1.customer, t1.purchase_date
order by t1.customer, t1.purchase_date;

customer  purchase_date  new_column
--
A         2013-01-01     1
A         2013-01-02     2
A         2013-01-07     3
A         2013-01-11     2
B         2013-01-03     1
B         2013-01-10     1

Whether this scales well depends on how well DB2 handles non-equi joins. DB2 EXPLAIN will guide you. I'd expect an index on "purchase_date" and a restrictive WHERE clause to perform well.

Dynamic way of counting number of previous rows in a table that satisfy some condition, over certain columns in the table

1 Answers1