0

As a specific example say I have a table T with columns customer and date indicating days on which individual customers have made purchases:

customer |   date   
----------------------  
       A | 01/01/2013 
       A | 02/01/2013
       A | 07/01/2013
       A | 11/01/2013
       B | 03/01/2013
       B | 08/01/2013       

I want to add another column that for each pair (customer, date) pair (c, d), gives the number of pairs (c', d') in T such that c = c' and 0 <= days(d) - days(d') <= 7. Below is the table with this extra column:

customer |   date     | new_column
----------------------------------  
       A | 01/01/2013 |          1
       A | 02/01/2013 |          2
       A | 07/01/2013 |          3 
       A | 11/01/2013 |          2
       B | 03/01/2013 |          1
       B | 10/01/2013 |          1

As a rough idea of the steps I used to solve this problem:

  • create a table T' with all possible pairs (c,d);
  • left join T onto T';
  • create a new column: count(date) over (partition by customer order by date asc rows between 6 preceding and 0 following);
  • omit any rows from this new table where T.date is null

However, I don't think this is scalable.

Cheers for any help.

Andriy M
  • 76,112
  • 17
  • 94
  • 154
user32259
  • 1,113
  • 3
  • 13
  • 21

1 Answers1

0

Let's start with some DDL. (You'll get more answers and better answers if you include DDL and sample INSERT statements in your questions.)

create table test (
  customer char(1) not null,
  purchase_date date not null,
  primary key (customer, purchase_date)
);

insert into test values
('A', '2013-01-01'),
('A', '2013-01-02'),
('A', '2013-01-07'),
('A', '2013-01-11'),
('B', '2013-01-03'),
('B', '2013-01-10');

In standard SQL, you can use something along these lines. It doesn't require creating another table, outer joins, or window functions. It's not clear whether you have a good reason for wanting to create a new table, but it's not necessary to get the right data. (I renamed the "date" column to avoid a reserved word.)

select t1.customer, t1.purchase_date, count(*) new_column
from test t1
inner join test t2 on t1.customer = t2.customer
and t2.purchase_date <= t1.purchase_date and t2.purchase_date > t1.purchase_date - interval '7 day'
group by t1.customer, t1.purchase_date
order by t1.customer, t1.purchase_date;

customer  purchase_date  new_column
--
A         2013-01-01     1
A         2013-01-02     2
A         2013-01-07     3
A         2013-01-11     2
B         2013-01-03     1
B         2013-01-10     1

Whether this scales well depends on how well DB2 handles non-equi joins. DB2 EXPLAIN will guide you. I'd expect an index on "purchase_date" and a restrictive WHERE clause to perform well.

Community
  • 1
  • 1
Mike Sherrill 'Cat Recall'
  • 91,602
  • 17
  • 122
  • 185